Bug: InstCombine misses fold of integer NaN check to fcmp uno — 4 insns vs 2
Title
InstCombine: (bitcast float to i32 & 0x7FFFFFFF) > 0x7F800000 not folded to fcmp uno — 4 instructions vs 2
Summary
The standard way to test for NaN via integer bit manipulation — mask off the sign bit
and compare against +∞’s bit pattern — is not recognized as equivalent to
fcmp uno float %f, %f and generates 4 instructions instead of 2.
InstCombine does partially fold the pattern (replacing bitcast & 0x7FFFFFFF with
fabs), but does not take the final step of recognising that
bitcast(fabs(%f)) > bits(+∞) is the definition of NaN.
This pattern appears in §17-3 of Hacker’s Delight as one of several IEEE field tests written purely via integer operations on the float bit pattern.
Reproducer
define i1 @float_is_nan(float %f) {
%i = bitcast float %f to i32
%a = and i32 %i, 2147483647 ; 0x7FFFFFFF — strip sign bit
%r = icmp ugt i32 %a, 2139095040 ; > 0x7F800000 = bits(+∞)
ret i1 %r
}opt -O2 -S float_is_nan.ll # partial fold only (fabs, not fcmp uno)
llc -O2 -mtriple=x86_64-pc-linux-gnu float_is_nan.ll -o -Current output (4 instructions)
movd %xmm0, %eax
andl $2147483647, %eax ; 0x7FFFFFFF
cmpl $2139095041, %eax ; 0x7F800001 (setae ↔ ugt 0x7F800000)
setae %al
retqAfter opt -O2, the IR is partially simplified to:
%1 = tail call float @llvm.fabs.f32(float %f)
%a = bitcast float %1 to i32
%r = icmp samesign ugt i32 %a, 2139095040
ret i1 %ropt replaced bitcast & 0x7FFFFFFF with fabs, but did not continue to
fcmp uno. llc then generates the same 4-instruction sequence.
Expected output (2 instructions)
ucomiss %xmm0, %xmm0
setp %al
retqThis is exactly what fcmp uno float %f, %f produces — verified:
define i1 @float_is_nan_fcmp(float %f) {
%r = fcmp uno float %f, %f
ret i1 %r
}→ 2 instructions with ucomiss %xmm0, %xmm0; setp %al.
Why they are equivalent
A float f is NaN if and only if:
- the biased exponent field is all-ones (= 0xFF for single precision), AND
- the mantissa field is nonzero.
In integer representation: (bits(f) & 0x7FFFFFFF) > 0x7F800000, because
0x7F800000 is +∞ (all-ones exponent, zero mantissa) and the only values
with a larger stripped representation are NaN.
fcmp uno float %f, %f returns true iff %f is unordered with itself, which
is true iff %f is NaN.
Both predicates are true for exactly the same set of bit patterns.
Note: float_is_inf is not a comparable miss
(bitcast(float %f) & 0x7FFFFFFF) == 0x7F800000 tests for infinity.
opt -O2 folds this to fcmp oeq float fabs(f), +Inf, but llc still generates
4 integer instructions because the ucomiss path would incorrectly return true
for NaN inputs (NaN ucomiss +Inf sets ZF=1 and PF=1; sete alone is wrong).
The integer path is correct and 4 insns is optimal for float_is_inf.
For NaN however — fcmp uno %f, %f sets the parity flag via ucomiss %xmm0, %xmm0
and setp reads exactly that flag — there is no edge-case ambiguity.
Where to fix
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp or
llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp:
Recognize the pattern (post-fabs-fold form):
icmp ugt (bitcast float (fabs %f)), bits(+∞)
→ fcmp uno float %f, %f
Alternatively, recognise the raw form before the fabs fold:
icmp ugt (and (bitcast float %f to i32), 0x7FFFFFFF), 0x7F800000
→ fcmp uno float %f, %f
The same fold applies to double precision with mask 0x7FFFFFFFFFFFFFFF and
threshold 0x7FF0000000000000.