Bug: x86 backend misses SHR+SBB for CRC inner loop
Title
x86: missed SHR+SBB peephole for CRC inner loop — 6→4 insns (right-shift) and 5→4 insns (left-shift)
Summary
The classic bit-at-a-time CRC inner loop uses a conditional XOR with a polynomial constant. The optimal x86 code exploits the carry flag set by the shift instruction to avoid a separate AND+NEG (or copy+SAR) sequence. LLVM generates 6 instructions for the right-shift form and 5 for the left-shift form; both are optimal at 4.
Reproducer
; Right-shift CRC step: crc = (crc >> 1) ^ (-(crc & 1) & poly)
define i32 @crc32_step_mask(i32 %crc) {
%low = and i32 %crc, 1
%mask = sub i32 0, %low
%poly = and i32 %mask, -307974906 ; 0xEDB88320
%shr = lshr i32 %crc, 1
%r = xor i32 %shr, %poly
ret i32 %r
}
; Left-shift CRC step: crc = (crc << 1) ^ ((crc < 0) ? poly : 0)
define i32 @crc32_step_left(i32 %crc) {
%sign = icmp slt i32 %crc, 0
%poly = select i1 %sign, i32 79764919, i32 0 ; 0x04C11DB7
%shl = shl i32 %crc, 1
%r = xor i32 %shl, %poly
ret i32 %r
}llc -O2 -mtriple=x86_64-pc-linux-gnuCurrent output
crc32_step_mask: ; 6 instructions
movl %edi, %eax
andl $1, %eax
negl %eax
andl $-307974906, %eax ; 0xEDB88320
shrl %edi
xorl %edi, %eax
retq
crc32_step_left: ; 5 instructions
leal (%rdi,%rdi), %eax ; crc << 1 — LEA does NOT set CF
movl %edi, %ecx
sarl $31, %ecx ; -(crc < 0) via sign extension
andl $79764919, %ecx ; 0x04C11DB7
xorl %ecx, %eax
retqExpected output
crc32_step_mask: ; 4 instructions
shrl %edi ; edi = crc >> 1; CF = crc & 1 (shifted-out bit)
sbbl %eax, %eax ; eax = 0 - 0 - CF = -CF (0 or 0xFFFFFFFF)
andl $-307974906, %eax ; 0xEDB88320 or 0
xorl %edi, %eax ; (crc >> 1) ^ poly_or_0
retq
crc32_step_left: ; 4 instructions
addl %edi, %edi ; edi = crc << 1; CF = old bit 31 = (crc < 0)
sbbl %eax, %eax ; eax = -CF
andl $79764919, %eax ; 0x04C11DB7 or 0
xorl %edi, %eax
retqWhy this is correct
Right-shift form:
On x86, shrl r, 1 (SHR by 1) sets CF to the bit shifted out — i.e., bit 0 of the
original operand. After shrl %edi, CF = crc & 1.
sbbl %eax, %eax computes eax - eax - CF = -CF. Since eax - eax = 0,
the result is 0 - CF. If CF = 0, eax = 0. If CF = 1, eax = -1 = 0xFFFFFFFF.
This is exactly -(crc & 1), the desired mask, in one instruction.
The full 4-insn sequence:
shrl %edi→ edi = crc >> 1; CF = crc & 1sbbl %eax, %eax→ eax = 0 or 0xFFFFFFFFandl $poly, %eax→ eax = poly or 0xorl %edi, %eax→ (crc >> 1) XOR poly_or_0
Left-shift form:
addl %edi, %edi (= crc + crc = crc << 1) sets CF to the carry out of bit 31,
which equals the old bit 31 = (crc >> 31) & 1 = (1 if crc < 0, 0 otherwise).
The SBB then gives the same mask.
Note: LLVM uses leal (%rdi,%rdi), %eax for crc << 1, which avoids clobbering
%rdi but does not set CF. This prevents the SBB optimization. Using addl
or shll instead of LEA would set CF and enable 4-insn output.
Root cause
The x86 backend lowering of:
(XOR (SRL x, 1) (AND (NEG (AND x, 1)) C))
and:
(XOR (SHL x, 1) (AND (NEG (SRL (SAR x, 31))) C))
does not recognize that these are candidates for the SHR/ADD + SBB idiom.
Additionally, for the left-shift case, the backend prefers LEA (which doesn’t set flags and leaves the source register intact) over ADD/SHL (which set CF and clobber the source), missing the opportunity to use SBB.
Where to fix
llvm/lib/Target/X86/X86ISelLowering.cpp: add a DAGCombine pattern for(XOR (SRL x, 1) (AND (NEG (AND x, 1)) C))→SHR + SBB + AND + XOR.llvm/lib/Target/X86/X86ISelDAGToDAG.cppor peephole pass: recognize(SHL x, 1)feeding an SBB-style mask and preferaddl/shlloverlealwhen the CF output is consumed.
Generalization
The pattern generalizes: for any shift-by-1 that feeds a -(bit & 1) mask:
(SRL x, 1)sets CF = bit 0 →SBBgives-(bit 0)(SHL x, 1)sets CF = bit 31 →SBBgives-(bit 31)= sign mask
This is a common CRC idiom and appears in both IEEE 802.3 (right-shift, poly = 0xEDB88320) and ITU/left-shift (poly = 0x04C11DB7) implementations.