Chapter 17: Floating-Point — LLVM Validation
LLVM version tested: 23.0.0git (build at ~/llvm-project/build/bin)
Already handled correctly
| Pattern | IR | LLVM output | Notes |
|---|---|---|---|
Float sign test (f < 0) | bitcast; icmp slt 0 | movmskps + andl $1 | 2 insns, optimal |
Float NaN test via fcmp uno | fcmp uno %f, %f | ucomiss %xmm0,%xmm0; setp | 2 insns, optimal |
Float-to-int via fptosi | fptosi float to i32 | cvttss2si | 1 insn, optimal |
| §17-3 sign-magnitude precondition | ashr $31; lshr $1; xor; sub | 7 insns | Sequential data-dependent chain — no better alternative |
| §17-4 fast inv sqrt (integer step) | bitcast; lshr $1; sub $magic | 5 insns | Faithful — should not fold to rsqrtss (different precision) |
| §17-4 fast inv sqrt + Newton | integer step + fmul/fadd Newton | 11 insns | Correct — LLVM does not (and should not) fold to rsqrtss without -ffast-math |
| Float comparison via integer (§17-3) | preconditon both operands + icmp slt | 10 insns (vectorised) | LLVM correctly does NOT fold to ucomiss — NaN semantics differ |
Missed optimization: integer NaN check not folded to fcmp uno
Pattern
§17-3 gives a table of IEEE field tests expressed as pure integer operations on the float’s bit pattern. The NaN test is:
(*(int*)&f & 0x7FFFFFFF) > 0x7F800000 // true iff f is NaNIn LLVM IR:
%i = bitcast float %f to i32
%a = and i32 %i, 2147483647 ; 0x7FFFFFFF
%r = icmp ugt i32 %a, 2139095040 ; 0x7F800000 = bits(+∞)This is semantically equivalent to fcmp uno float %f, %f.
Instruction counts
| Target | Current | Optimal | Miss |
|---|---|---|---|
| Any x86-64 | 4 | 2 (ucomiss + setp) | 2 |
Root cause
opt -O2 partially folds the pattern: it recognizes bitcast & 0x7FFFFFFF as
fabs and rewrites to bitcast(fabs(%f)) ugt bits(+∞). But it does not take
the final step of recognizing that result as fcmp uno %f, %f.
The fix is in InstCombine: bitcast(fabs(f)) ugt bits(+∞) → fcmp uno f, f.
Comparison: float_is_inf is NOT a miss
The corresponding infinity check (bits & 0x7FFFFFFF) == 0x7F800000 similarly
folds to fcmp oeq fabs(f), +Inf in opt, but llc correctly keeps it as a
4-instruction integer sequence. The reason: ucomiss(NaN, +Inf) sets ZF=1 and PF=1,
so sete alone gives the wrong answer (1) for NaN inputs. The integer path
handles NaN correctly at the same cost.
For the NaN check, ucomiss %xmm0, %xmm0 has no such ambiguity — setp reads
exactly the PF set by an unordered compare. So 4 insns → 2 is a genuine win.
Test files
| File | What it tests |
|---|---|
ch17_float.ll | Sign-magnitude precondition, fast inv sqrt, float_is_nan (int vs fcmp), float_is_inf, float_is_negative |
bug-float-nan-int-check.md | Bug report: integer NaN check not folded to fcmp uno |