Chapter 11: Some Elementary Functions — LLVM Validation
LLVM version tested: 23.0.0git (build at ~/llvm-project/build/bin)
Already handled correctly
| Pattern | IR | LLVM output | Notes |
|---|---|---|---|
| ilog2, x > 0 (§11-4) | 31 - ctlz.i32(x, true) | bsrl | 1 instruction |
| ilog2 i64, x > 0 (§11-4) | 63 - ctlz.i64(x, true) | bsrq | 1 instruction |
| ilog10 approx (§11-4, Fig 11-12) | (9*(31-ctlz(x))) >> 5 | bsrl + leal(*9) + shrl $5 | 3 insns; LEA for ×9 |
| ilog10 approx variant (§11-4, Fig 11-13) | (19*(31-ctlz(x))) >> 6 | bsrl + leal(*9) + leal(*2+1) + shrl $6 | 4 insns; two LEAs for ×19 |
| x^4 via squaring (§11-3) | (x*x)*(x*x) | 2 imull | binary decomp |
| x^13 binary decomp (§11-3) | 5-multiply chain | 5 imull | optimal addition-chain length for 13 |
Missed optimization: (W−1) − ctlz(x, false) on x86_64 (without LZCNT)
Problem
(W-1) - ctlz(x, i1 false) — integer log2 that returns −1 for x = 0 — generates
4 instructions but 3 instructions suffice using bsr + cmovz.
Affected pattern: 31 - ctlz.i32(x, false), 63 - ctlz.i64(x, false),
15 - ctlz.i16(x, false).
What LLVM generates (ilog2_safe, i32)
movl $63, %eax ; pre-load 63 so x=0 case gives -1 after XOR+ADD
bsrl %edi, %eax ; BSR writes bsr(x) for x≠0; leaves 63 if x=0 (Intel/AMD behaviour)
xorl $-32, %eax ; correction: bsr XOR 0xFFFFFFE0 + 32 = bsr (identity for 0..31)
addl $32, %eax ; 63 XOR 0xFFFFFFE0 + 32 = -1 (x=0 case)Optimal (3 instructions)
bsrl %edi, %eax ; ZF=1 if x=0; eax=bsr(x) for x≠0 (undefined for x=0, but overwritten below)
movl $-1, %ecx ; fallback
cmovzl %ecx, %eax ; if ZF=1 (x=0): eax=-1; else: eax=bsr(x)=31-ctlz(x)Why the 3-instruction form is correct
bsrsets ZF = 1 iff src = 0. The destination is undefined for src = 0, butcmovzoverwrites it immediately — no undefined value escapes.- For x ≠ 0: ZF = 0,
cmovzis a no-op,eax = bsr(x) = 31 − ctlz(x). ✓ - For x = 0: ZF = 1,
cmovzwrites −1.31 − ctlz(0) = 31 − 32 = −1. ✓ cmovzis part of the x86_64 baseline ISA (no extra feature flags needed).
Why the current 4-instruction form is suboptimal
LLVM’s pre-load-63 + BSR + XOR + ADD trick relies on Intel/AMD processors leaving the BSR destination unchanged for src = 0 (called “undefined” in the ISA, but in practice never mutated). This costs one extra instruction.
Same issue for i64 and i16
; i64 — current (4 insns):
movl $127, %eax
bsrq %rdi, %rax
xorq $-64, %rax
addq $64, %rax
; i64 — optimal (3 insns):
bsrq %rdi, %rax
movq $-1, %rcx
cmovzq %rcx, %raxNote: with LZCNT extension
With -mattr=+lzcnt, LLVM already generates 3 instructions:
lzcntl %edi, %ecx
movl $31, %eax
subl %ecx, %eaxThe miss only affects the baseline x86_64 codegen (without LZCNT).
Test files
| File | What it tests |
|---|---|
ch11_basic.ll | Already-handled patterns (ilog2 nonzero, ilog10 approx, ipow) |
ch11_ilog2_safe.ll | The safe ilog2 / (W-1)-ctlz(x,false) miss (i32, i64, i16) |
bug-ilog2-safe.md | Bug report draft |