Bug: InstCombine misses fold of integer NaN check to fcmp uno — 4 insns vs 2

Title

InstCombine: (bitcast float to i32 & 0x7FFFFFFF) > 0x7F800000 not folded to fcmp uno — 4 instructions vs 2

Summary

The standard way to test for NaN via integer bit manipulation — mask off the sign bit and compare against +∞’s bit pattern — is not recognized as equivalent to fcmp uno float %f, %f and generates 4 instructions instead of 2.

InstCombine does partially fold the pattern (replacing bitcast & 0x7FFFFFFF with fabs), but does not take the final step of recognising that bitcast(fabs(%f)) > bits(+∞) is the definition of NaN.

This pattern appears in §17-3 of Hacker’s Delight as one of several IEEE field tests written purely via integer operations on the float bit pattern.

Reproducer

define i1 @float_is_nan(float %f) {
  %i = bitcast float %f to i32
  %a = and  i32 %i, 2147483647   ; 0x7FFFFFFF — strip sign bit
  %r = icmp ugt i32 %a, 2139095040  ; > 0x7F800000 = bits(+∞)
  ret i1 %r
}
opt -O2 -S float_is_nan.ll   # partial fold only (fabs, not fcmp uno)
llc -O2 -mtriple=x86_64-pc-linux-gnu float_is_nan.ll -o -

Current output (4 instructions)

movd   %xmm0, %eax
andl   $2147483647, %eax      ; 0x7FFFFFFF
cmpl   $2139095041, %eax      ; 0x7F800001 (setae ↔ ugt 0x7F800000)
setae  %al
retq

After opt -O2, the IR is partially simplified to:

%1 = tail call float @llvm.fabs.f32(float %f)
%a = bitcast float %1 to i32
%r = icmp samesign ugt i32 %a, 2139095040
ret i1 %r

opt replaced bitcast & 0x7FFFFFFF with fabs, but did not continue to fcmp uno. llc then generates the same 4-instruction sequence.

Expected output (2 instructions)

ucomiss  %xmm0, %xmm0
setp     %al
retq

This is exactly what fcmp uno float %f, %f produces — verified:

define i1 @float_is_nan_fcmp(float %f) {
  %r = fcmp uno float %f, %f
  ret i1 %r
}

→ 2 instructions with ucomiss %xmm0, %xmm0; setp %al.

Why they are equivalent

A float f is NaN if and only if:

  • the biased exponent field is all-ones (= 0xFF for single precision), AND
  • the mantissa field is nonzero.

In integer representation: (bits(f) & 0x7FFFFFFF) > 0x7F800000, because 0x7F800000 is +∞ (all-ones exponent, zero mantissa) and the only values with a larger stripped representation are NaN.

fcmp uno float %f, %f returns true iff %f is unordered with itself, which is true iff %f is NaN.

Both predicates are true for exactly the same set of bit patterns.

Note: float_is_inf is not a comparable miss

(bitcast(float %f) & 0x7FFFFFFF) == 0x7F800000 tests for infinity. opt -O2 folds this to fcmp oeq float fabs(f), +Inf, but llc still generates 4 integer instructions because the ucomiss path would incorrectly return true for NaN inputs (NaN ucomiss +Inf sets ZF=1 and PF=1; sete alone is wrong). The integer path is correct and 4 insns is optimal for float_is_inf.

For NaN however — fcmp uno %f, %f sets the parity flag via ucomiss %xmm0, %xmm0 and setp reads exactly that flag — there is no edge-case ambiguity.

Where to fix

llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp or llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp:

Recognize the pattern (post-fabs-fold form):

icmp ugt (bitcast float (fabs %f)), bits(+∞)

fcmp uno float %f, %f

Alternatively, recognise the raw form before the fabs fold:

icmp ugt (and (bitcast float %f to i32), 0x7FFFFFFF), 0x7F800000

fcmp uno float %f, %f

The same fold applies to double precision with mask 0x7FFFFFFFFFFFFFFF and threshold 0x7FF0000000000000.