Bug: x86_64 sdiv i32 by constants with positive magic wastes one instruction
Title
x86_64: sdiv i32 by constant with positive magic number emits 6 insns instead of 5
Summary
For sdiv i32 %n, d where the Hacker’s Delight magic number M < 2^31 (positive),
LLVM emits a redundant movq + shrq $63 pair to extract the sign bit of the
64-bit product n*M. Because M > 0, sign(n*M) = sign(n), so the sign bit
can be extracted from the input register before the multiply, saving one instruction.
Affected divisors (W=32)
All d where the signed magic fits in a positive 32-bit value.
From Hacker’s Delight Table 10-1: 3, 5, 6, 9, 10, 11, 12, 25, 125, 625, and others.
Reproducer
define i32 @sdiv3(i32 %n) {
%r = sdiv i32 %n, 3
ret i32 %r
}llc -O2 -mtriple=x86_64-pc-linux-gnuCurrent output (6 instructions)
sdiv3:
movslq %edi, %rax
imulq $1431655766, %rax, %rax ; M = 0x55555556
movq %rax, %rcx ; ← redundant copy
shrq $63, %rcx ; ← sign of product (2 insns for 1 bit)
shrq $32, %rax
addl %ecx, %eax
retqExpected output (5 instructions)
sdiv3:
movslq %edi, %rax ; reads %edi, does NOT modify it
shrl $31, %edi ; sign of n (0 or 1)
imulq $1431655766, %rax, %rax
shrq $32, %rax
addl %edi, %eax
retqWhy it is correct
movslq %edi, %raxsign-extends%ediinto%raxbut leaves%ediunchanged.M = 1431655766 = 0x55555556 < 2^31, soMis positive when sign-interpreted.- For all i32
n:|n| < 2^31,M < 2^31, so|n*M| < 2^62— no 64-bit overflow. - Therefore
sign(n*M) = sign(n), and(n*M) >> 63 == n >> 31(as 0/1 unsigned). - Extracting the sign correction from
%edi(shrl $31) is equivalent to extracting it from the product (movq + shrq $63) but uses one fewer instruction.
The same argument applies to all divisors where M < 2^31.
Why this does not apply to sdiv i64
The 64-bit path uses the 1-operand imulq %rcx form (128-bit product in rdx:rax),
which always requires a final movq %rdx, %rax to place the result in the return
register. This extra move offsets any saving, so 6 instructions remains optimal.
Where to fix
The sign-correction extraction logic in X86ISelLowering.cpp / DAGCombine for
the SMULH-based division-by-constant lowering on 32-bit operands.
When selecting the sign bit source, check if the magic number is positive and,
if so, use SRL(n, 31) instead of SRL(SMULH(n, M), 63).