Chapter 17: Floating-Point — LLVM Validation

LLVM version tested: 23.0.0git (build at ~/llvm-project/build/bin)


Already handled correctly

PatternIRLLVM outputNotes
Float sign test (f < 0)bitcast; icmp slt 0movmskps + andl $12 insns, optimal
Float NaN test via fcmp unofcmp uno %f, %fucomiss %xmm0,%xmm0; setp2 insns, optimal
Float-to-int via fptosifptosi float to i32cvttss2si1 insn, optimal
§17-3 sign-magnitude preconditionashr $31; lshr $1; xor; sub7 insnsSequential data-dependent chain — no better alternative
§17-4 fast inv sqrt (integer step)bitcast; lshr $1; sub $magic5 insnsFaithful — should not fold to rsqrtss (different precision)
§17-4 fast inv sqrt + Newtoninteger step + fmul/fadd Newton11 insnsCorrect — LLVM does not (and should not) fold to rsqrtss without -ffast-math
Float comparison via integer (§17-3)preconditon both operands + icmp slt10 insns (vectorised)LLVM correctly does NOT fold to ucomiss — NaN semantics differ

Missed optimization: integer NaN check not folded to fcmp uno

Pattern

§17-3 gives a table of IEEE field tests expressed as pure integer operations on the float’s bit pattern. The NaN test is:

(*(int*)&f & 0x7FFFFFFF) > 0x7F800000  // true iff f is NaN

In LLVM IR:

%i = bitcast float %f to i32
%a = and  i32 %i, 2147483647   ; 0x7FFFFFFF
%r = icmp ugt i32 %a, 2139095040  ; 0x7F800000 = bits(+∞)

This is semantically equivalent to fcmp uno float %f, %f.

Instruction counts

TargetCurrentOptimalMiss
Any x86-6442 (ucomiss + setp)2

Root cause

opt -O2 partially folds the pattern: it recognizes bitcast & 0x7FFFFFFF as fabs and rewrites to bitcast(fabs(%f)) ugt bits(+∞). But it does not take the final step of recognizing that result as fcmp uno %f, %f.

The fix is in InstCombine: bitcast(fabs(f)) ugt bits(+∞)fcmp uno f, f.

Comparison: float_is_inf is NOT a miss

The corresponding infinity check (bits & 0x7FFFFFFF) == 0x7F800000 similarly folds to fcmp oeq fabs(f), +Inf in opt, but llc correctly keeps it as a 4-instruction integer sequence. The reason: ucomiss(NaN, +Inf) sets ZF=1 and PF=1, so sete alone gives the wrong answer (1) for NaN inputs. The integer path handles NaN correctly at the same cost.

For the NaN check, ucomiss %xmm0, %xmm0 has no such ambiguity — setp reads exactly the PF set by an unordered compare. So 4 insns → 2 is a genuine win.


Test files

FileWhat it tests
ch17_float.llSign-magnitude precondition, fast inv sqrt, float_is_nan (int vs fcmp), float_is_inf, float_is_negative
bug-float-nan-int-check.mdBug report: integer NaN check not folded to fcmp uno