Chapter 16: Hilbert’s Curve — LLVM Validation
LLVM version tested: 23.0.0git (build at ~/llvm-project/build/bin)
Already handled correctly
| Pattern | IR | LLVM output | Notes |
|---|---|---|---|
| Conditional swap via select | select i1, x, y | cmovnel × 2 | Branchless, optimal |
| Prefix XOR by-2 chain | 4-step xor + shr | 12 insns (sequential chain) | No improvement possible; chain is data-dependent |
Unshuffle step 1 (t × 3) | t ^ (t<<1) where bits non-overlapping | leal (%rax,%rax,2) | LLVM correctly uses LEA multiply for disjoint XOR |
Missed optimization: 4-step perfect unshuffle not recognized as PEXT
Pattern
§16-2 Figure 16-8 ends with a “perfect unshuffle” to separate the interleaved
x and y bits of the Hilbert curve index s into two n-bit values:
t = (s ^ (s>>1)) & 0x22222222; s ^= t ^ (t<<1);
t = (s ^ (s>>2)) & 0x0C0C0C0C; s ^= t ^ (t<<2);
t = (s ^ (s>>4)) & 0x00F000F0; s ^= t ^ (t<<4);
t = (s ^ (s>>8)) & 0x0000FF00; s ^= t ^ (t<<8);
x = s >> 16; y = s & 0xFFFF;This is equivalent to x = pext(s, 0x55555555); y = pext(s, 0xAAAAAAAA).
Instruction counts
| Target | Current | Optimal |
|---|---|---|
| No BMI2 | 30 insns | 30 insns (no better alternative) |
| +bmi2 | 30 insns | 4 insns (pextl × 2 + constant loads) |
Root cause
The x86 backend does not recognize the 4-step (s ^ (s>>k)) & C; s ^= t ^ (t<<k)
bit-transpose sequence as a perfect unshuffle, and does not lower it to PEXT even
when BMI2 is available.
When written explicitly via @llvm.x86.bmi.pext.32, llc generates the correct
4-instruction output. The miss is in the pattern recognizer, not in code generation.
Severity
Moderate. The Hilbert curve s → (x,y) conversion is used in spatial indexing,
ray-tracing, cache-oblivious algorithms, and processor scheduling. The unshuffle
is the final step and dominates the instruction count of the parallel-prefix form
(Figure 16-8, which is otherwise 66 RISC instructions). With BMI2 available (all
Intel Haswell+ and AMD Excavator+), 30→4 insns is a 7.5× improvement.
Test files
| File | What it tests |
|---|---|
ch16_hilbert.ll | Unshuffle (30 vs 4 insns with BMI2), prefix XOR by-2, branchless cond swap |
bug-unshuffle-pext.md | Bug report: unshuffle → PEXT not recognized |