Chapter 11: Some Elementary Functions — LLVM Validation

LLVM version tested: 23.0.0git (build at ~/llvm-project/build/bin)


Already handled correctly

PatternIRLLVM outputNotes
ilog2, x > 0 (§11-4)31 - ctlz.i32(x, true)bsrl1 instruction
ilog2 i64, x > 0 (§11-4)63 - ctlz.i64(x, true)bsrq1 instruction
ilog10 approx (§11-4, Fig 11-12)(9*(31-ctlz(x))) >> 5bsrl + leal(*9) + shrl $53 insns; LEA for ×9
ilog10 approx variant (§11-4, Fig 11-13)(19*(31-ctlz(x))) >> 6bsrl + leal(*9) + leal(*2+1) + shrl $64 insns; two LEAs for ×19
x^4 via squaring (§11-3)(x*x)*(x*x)2 imullbinary decomp
x^13 binary decomp (§11-3)5-multiply chain5 imulloptimal addition-chain length for 13

Missed optimization: (W−1) − ctlz(x, false) on x86_64 (without LZCNT)

Problem

(W-1) - ctlz(x, i1 false) — integer log2 that returns −1 for x = 0 — generates 4 instructions but 3 instructions suffice using bsr + cmovz.

Affected pattern: 31 - ctlz.i32(x, false), 63 - ctlz.i64(x, false), 15 - ctlz.i16(x, false).

What LLVM generates (ilog2_safe, i32)

movl  $63, %eax          ; pre-load 63 so x=0 case gives -1 after XOR+ADD
bsrl  %edi, %eax          ; BSR writes bsr(x) for x≠0; leaves 63 if x=0 (Intel/AMD behaviour)
xorl  $-32, %eax          ; correction: bsr XOR 0xFFFFFFE0 + 32 = bsr (identity for 0..31)
addl  $32, %eax           ;             63 XOR 0xFFFFFFE0 + 32 = -1 (x=0 case)

Optimal (3 instructions)

bsrl  %edi, %eax          ; ZF=1 if x=0; eax=bsr(x) for x≠0 (undefined for x=0, but overwritten below)
movl  $-1,  %ecx          ; fallback
cmovzl %ecx, %eax         ; if ZF=1 (x=0): eax=-1; else: eax=bsr(x)=31-ctlz(x)

Why the 3-instruction form is correct

  • bsr sets ZF = 1 iff src = 0. The destination is undefined for src = 0, but cmovz overwrites it immediately — no undefined value escapes.
  • For x ≠ 0: ZF = 0, cmovz is a no-op, eax = bsr(x) = 31 − ctlz(x). ✓
  • For x = 0: ZF = 1, cmovz writes −1. 31 − ctlz(0) = 31 − 32 = −1. ✓
  • cmovz is part of the x86_64 baseline ISA (no extra feature flags needed).

Why the current 4-instruction form is suboptimal

LLVM’s pre-load-63 + BSR + XOR + ADD trick relies on Intel/AMD processors leaving the BSR destination unchanged for src = 0 (called “undefined” in the ISA, but in practice never mutated). This costs one extra instruction.

Same issue for i64 and i16

; i64 — current (4 insns):
movl   $127, %eax
bsrq   %rdi, %rax
xorq   $-64, %rax
addq   $64, %rax
 
; i64 — optimal (3 insns):
bsrq   %rdi, %rax
movq   $-1,  %rcx
cmovzq %rcx, %rax

Note: with LZCNT extension

With -mattr=+lzcnt, LLVM already generates 3 instructions:

lzcntl  %edi, %ecx
movl    $31, %eax
subl    %ecx, %eax

The miss only affects the baseline x86_64 codegen (without LZCNT).


Test files

FileWhat it tests
ch11_basic.llAlready-handled patterns (ilog2 nonzero, ilog10 approx, ipow)
ch11_ilog2_safe.llThe safe ilog2 / (W-1)-ctlz(x,false) miss (i32, i64, i16)
bug-ilog2-safe.mdBug report draft