Bug: x86_64 sdiv i32 by constants with positive magic wastes one instruction

Title

x86_64: sdiv i32 by constant with positive magic number emits 6 insns instead of 5

Summary

For sdiv i32 %n, d where the Hacker’s Delight magic number M < 2^31 (positive), LLVM emits a redundant movq + shrq $63 pair to extract the sign bit of the 64-bit product n*M. Because M > 0, sign(n*M) = sign(n), so the sign bit can be extracted from the input register before the multiply, saving one instruction.

Affected divisors (W=32)

All d where the signed magic fits in a positive 32-bit value. From Hacker’s Delight Table 10-1: 3, 5, 6, 9, 10, 11, 12, 25, 125, 625, and others.

Reproducer

define i32 @sdiv3(i32 %n) {
  %r = sdiv i32 %n, 3
  ret i32 %r
}

llc -O2 -mtriple=x86_64-pc-linux-gnu

Current output (6 instructions)

sdiv3:
    movslq  %edi, %rax
    imulq   $1431655766, %rax, %rax   ; M = 0x55555556
    movq    %rax, %rcx                ; ← redundant copy
    shrq    $63, %rcx                 ; ← sign of product (2 insns for 1 bit)
    shrq    $32, %rax
    addl    %ecx, %eax
    retq

Expected output (5 instructions)

sdiv3:
    movslq  %edi, %rax                ; reads %edi, does NOT modify it
    shrl    $31, %edi                 ; sign of n (0 or 1)
    imulq   $1431655766, %rax, %rax
    shrq    $32, %rax
    addl    %edi, %eax
    retq

Why it is correct

movslq %edi, %rax sign-extends %edi into %rax but leaves %edi unchanged.
M = 1431655766 = 0x55555556 < 2^31, so M is positive when sign-interpreted.
For all i32 n: |n| < 2^31, M < 2^31, so |n*M| < 2^62 — no 64-bit overflow.
Therefore sign(n*M) = sign(n), and (n*M) >> 63 == n >> 31 (as 0/1 unsigned).
Extracting the sign correction from %edi (shrl $31) is equivalent to extracting it from the product (movq + shrq $63) but uses one fewer instruction.

The same argument applies to all divisors where M < 2^31.

Why this does not apply to `sdiv i64`

The 64-bit path uses the 1-operand imulq %rcx form (128-bit product in rdx:rax), which always requires a final movq %rdx, %rax to place the result in the return register. This extra move offsets any saving, so 6 instructions remains optimal.

Where to fix

The sign-correction extraction logic in X86ISelLowering.cpp / DAGCombine for the SMULH-based division-by-constant lowering on 32-bit operands. When selecting the sign bit source, check if the magic number is positive and, if so, use SRL(n, 31) instead of SRL(SMULH(n, M), 63).

Takashi's Notes

Explorer

bug-sdiv-positive-magic

Bug: x86_64 sdiv i32 by constants with positive magic wastes one instruction

Title

Summary

Affected divisors (W=32)

Reproducer

Current output (6 instructions)

Expected output (5 instructions)

Why it is correct

Why this does not apply to `sdiv i64`

Where to fix

Graph View

Table of Contents

Takashi's Notes

Explorer

bug-sdiv-positive-magic

Bug: x86_64 sdiv i32 by constants with positive magic wastes one instruction

Title

Summary

Affected divisors (W=32)

Reproducer

Current output (6 instructions)

Expected output (5 instructions)

Why it is correct

Why this does not apply to sdiv i64

Where to fix

Graph View

Table of Contents

Why this does not apply to `sdiv i64`