Bug: x86 backend misses SHR+SBB for CRC inner loop

Title

x86: missed SHR+SBB peephole for CRC inner loop — 6→4 insns (right-shift) and 5→4 insns (left-shift)

Summary

The classic bit-at-a-time CRC inner loop uses a conditional XOR with a polynomial constant. The optimal x86 code exploits the carry flag set by the shift instruction to avoid a separate AND+NEG (or copy+SAR) sequence. LLVM generates 6 instructions for the right-shift form and 5 for the left-shift form; both are optimal at 4.

Reproducer

; Right-shift CRC step: crc = (crc >> 1) ^ (-(crc & 1) & poly)
define i32 @crc32_step_mask(i32 %crc) {
  %low   = and i32 %crc, 1
  %mask  = sub i32 0, %low
  %poly  = and i32 %mask, -307974906  ; 0xEDB88320
  %shr   = lshr i32 %crc, 1
  %r     = xor i32 %shr, %poly
  ret i32 %r
}
 
; Left-shift CRC step: crc = (crc << 1) ^ ((crc < 0) ? poly : 0)
define i32 @crc32_step_left(i32 %crc) {
  %sign  = icmp slt i32 %crc, 0
  %poly  = select i1 %sign, i32 79764919, i32 0   ; 0x04C11DB7
  %shl   = shl i32 %crc, 1
  %r     = xor i32 %shl, %poly
  ret i32 %r
}
llc -O2 -mtriple=x86_64-pc-linux-gnu

Current output

crc32_step_mask:                        ; 6 instructions
    movl  %edi, %eax
    andl  $1, %eax
    negl  %eax
    andl  $-307974906, %eax            ; 0xEDB88320
    shrl  %edi
    xorl  %edi, %eax
    retq
 
crc32_step_left:                        ; 5 instructions
    leal  (%rdi,%rdi), %eax            ; crc << 1 — LEA does NOT set CF
    movl  %edi, %ecx
    sarl  $31, %ecx                    ; -(crc < 0) via sign extension
    andl  $79764919, %ecx              ; 0x04C11DB7
    xorl  %ecx, %eax
    retq

Expected output

crc32_step_mask:                        ; 4 instructions
    shrl  %edi                         ; edi = crc >> 1; CF = crc & 1 (shifted-out bit)
    sbbl  %eax, %eax                   ; eax = 0 - 0 - CF = -CF (0 or 0xFFFFFFFF)
    andl  $-307974906, %eax            ; 0xEDB88320 or 0
    xorl  %edi, %eax                   ; (crc >> 1) ^ poly_or_0
    retq
 
crc32_step_left:                        ; 4 instructions
    addl  %edi, %edi                   ; edi = crc << 1; CF = old bit 31 = (crc < 0)
    sbbl  %eax, %eax                   ; eax = -CF
    andl  $79764919, %eax              ; 0x04C11DB7 or 0
    xorl  %edi, %eax
    retq

Why this is correct

Right-shift form:

On x86, shrl r, 1 (SHR by 1) sets CF to the bit shifted out — i.e., bit 0 of the original operand. After shrl %edi, CF = crc & 1.

sbbl %eax, %eax computes eax - eax - CF = -CF. Since eax - eax = 0, the result is 0 - CF. If CF = 0, eax = 0. If CF = 1, eax = -1 = 0xFFFFFFFF. This is exactly -(crc & 1), the desired mask, in one instruction.

The full 4-insn sequence:

  • shrl %edi → edi = crc >> 1; CF = crc & 1
  • sbbl %eax, %eax → eax = 0 or 0xFFFFFFFF
  • andl $poly, %eax → eax = poly or 0
  • xorl %edi, %eax → (crc >> 1) XOR poly_or_0

Left-shift form:

addl %edi, %edi (= crc + crc = crc << 1) sets CF to the carry out of bit 31, which equals the old bit 31 = (crc >> 31) & 1 = (1 if crc < 0, 0 otherwise). The SBB then gives the same mask.

Note: LLVM uses leal (%rdi,%rdi), %eax for crc << 1, which avoids clobbering %rdi but does not set CF. This prevents the SBB optimization. Using addl or shll instead of LEA would set CF and enable 4-insn output.

Root cause

The x86 backend lowering of:

(XOR (SRL x, 1) (AND (NEG (AND x, 1)) C))

and:

(XOR (SHL x, 1) (AND (NEG (SRL (SAR x, 31))) C))

does not recognize that these are candidates for the SHR/ADD + SBB idiom.

Additionally, for the left-shift case, the backend prefers LEA (which doesn’t set flags and leaves the source register intact) over ADD/SHL (which set CF and clobber the source), missing the opportunity to use SBB.

Where to fix

  • llvm/lib/Target/X86/X86ISelLowering.cpp: add a DAGCombine pattern for (XOR (SRL x, 1) (AND (NEG (AND x, 1)) C))SHR + SBB + AND + XOR.
  • llvm/lib/Target/X86/X86ISelDAGToDAG.cpp or peephole pass: recognize (SHL x, 1) feeding an SBB-style mask and prefer addl/shll over leal when the CF output is consumed.

Generalization

The pattern generalizes: for any shift-by-1 that feeds a -(bit & 1) mask:

  • (SRL x, 1) sets CF = bit 0 → SBB gives -(bit 0)
  • (SHL x, 1) sets CF = bit 31 → SBB gives -(bit 31) = sign mask

This is a common CRC idiom and appears in both IEEE 802.3 (right-shift, poly = 0xEDB88320) and ITU/left-shift (poly = 0x04C11DB7) implementations.