Chapter 14: Cyclic Redundancy Check — LLVM Validation

LLVM version tested: 23.0.0git (build at ~/llvm-project/build/bin)


Already handled correctly

PatternIRLLVM outputNotes
Table index (crc ^ byte) & 0xFFxor + andxorl + movzbl2 instructions, optimal

Missed optimization: CRC inner loop misses SHR+SBB idiom

Pattern

The bit-at-a-time CRC inner loop (§14-3) has two standard forms:

Right-shift (Figure 14-6, poly = 0xEDB88320):

mask = -(crc & 1);
crc  = (crc >> 1) ^ (0xEDB88320 & mask);

Left-shift (Figure 14-5, poly = 0x04C11DB7):

if (crc < 0) crc = (crc << 1) ^ 0x04C11DB7;
else         crc = (crc << 1);

Current output vs optimal

FormCurrent insnsOptimalMiss
Right-shift mask642
Right-shift select642
Left-shift select541

Right-shift: current (6) vs optimal (4)

; Current — 6 instructions
movl  %edi, %eax
andl  $1, %eax
negl  %eax
andl  $-307974906, %eax        ; 0xEDB88320
shrl  %edi
xorl  %edi, %eax
 
; Optimal — 4 instructions
shrl  %edi                     ; edi = crc >> 1; CF = crc & 1 (shifted-out bit)
sbbl  %eax, %eax               ; eax = -CF = 0 or 0xFFFFFFFF
andl  $-307974906, %eax        ; 0xEDB88320 or 0
xorl  %edi, %eax

shrl r, 1 sets CF = shifted-out bit = crc & 1. sbbl %eax, %eax computes eax - eax - CF = -CF in one instruction, giving the desired 0/0xFFFFFFFF mask.

Left-shift: current (5) vs optimal (4)

; Current — 5 instructions (uses LEA which doesn't set CF)
leal  (%rdi,%rdi), %eax        ; crc << 1 — LEA does NOT set CF!
movl  %edi, %ecx
sarl  $31, %ecx                ; sign-extend bit 31 as mask
andl  $79764919, %ecx          ; 0x04C11DB7
xorl  %ecx, %eax
 
; Optimal — 4 instructions
addl  %edi, %edi               ; edi = crc << 1; CF = old bit 31 = (crc < 0)
sbbl  %eax, %eax               ; eax = -CF
andl  $79764919, %eax          ; 0x04C11DB7 or 0
xorl  %edi, %eax

LLVM uses leal for crc << 1 (to avoid clobbering %rdi), which doesn’t set CF, blocking the SBB optimization. Using addl/shll instead sets CF and enables 4 insns.

Severity

Moderate — 2 instructions saved in the tight inner loop of the most common CRC implementation form. CRC is computed on every byte or word of data in storage, networking, and compression workloads. In a hot loop, 6→4 insns is meaningful.


Test files

FileWhat it tests
ch14_crc.llRight-shift mask form, right-shift select form, left-shift form, table index
bug-crc-shr-sbb.mdBug report: SHR+SBB missed optimization