Chapter 14: Cyclic Redundancy Check — LLVM Validation
LLVM version tested: 23.0.0git (build at ~/llvm-project/build/bin)
Already handled correctly
| Pattern | IR | LLVM output | Notes |
|---|---|---|---|
Table index (crc ^ byte) & 0xFF | xor + and | xorl + movzbl | 2 instructions, optimal |
Missed optimization: CRC inner loop misses SHR+SBB idiom
Pattern
The bit-at-a-time CRC inner loop (§14-3) has two standard forms:
Right-shift (Figure 14-6, poly = 0xEDB88320):
mask = -(crc & 1);
crc = (crc >> 1) ^ (0xEDB88320 & mask);Left-shift (Figure 14-5, poly = 0x04C11DB7):
if (crc < 0) crc = (crc << 1) ^ 0x04C11DB7;
else crc = (crc << 1);Current output vs optimal
| Form | Current insns | Optimal | Miss |
|---|---|---|---|
| Right-shift mask | 6 | 4 | 2 |
| Right-shift select | 6 | 4 | 2 |
| Left-shift select | 5 | 4 | 1 |
Right-shift: current (6) vs optimal (4)
; Current — 6 instructions
movl %edi, %eax
andl $1, %eax
negl %eax
andl $-307974906, %eax ; 0xEDB88320
shrl %edi
xorl %edi, %eax
; Optimal — 4 instructions
shrl %edi ; edi = crc >> 1; CF = crc & 1 (shifted-out bit)
sbbl %eax, %eax ; eax = -CF = 0 or 0xFFFFFFFF
andl $-307974906, %eax ; 0xEDB88320 or 0
xorl %edi, %eaxshrl r, 1 sets CF = shifted-out bit = crc & 1. sbbl %eax, %eax computes
eax - eax - CF = -CF in one instruction, giving the desired 0/0xFFFFFFFF mask.
Left-shift: current (5) vs optimal (4)
; Current — 5 instructions (uses LEA which doesn't set CF)
leal (%rdi,%rdi), %eax ; crc << 1 — LEA does NOT set CF!
movl %edi, %ecx
sarl $31, %ecx ; sign-extend bit 31 as mask
andl $79764919, %ecx ; 0x04C11DB7
xorl %ecx, %eax
; Optimal — 4 instructions
addl %edi, %edi ; edi = crc << 1; CF = old bit 31 = (crc < 0)
sbbl %eax, %eax ; eax = -CF
andl $79764919, %eax ; 0x04C11DB7 or 0
xorl %edi, %eaxLLVM uses leal for crc << 1 (to avoid clobbering %rdi), which doesn’t set CF,
blocking the SBB optimization. Using addl/shll instead sets CF and enables 4 insns.
Severity
Moderate — 2 instructions saved in the tight inner loop of the most common CRC implementation form. CRC is computed on every byte or word of data in storage, networking, and compression workloads. In a hot loop, 6→4 insns is meaningful.
Test files
| File | What it tests |
|---|---|
ch14_crc.ll | Right-shift mask form, right-shift select form, left-shift form, table index |
bug-crc-shr-sbb.md | Bug report: SHR+SBB missed optimization |