LLVM CLMUL Validation

This directory checks how current-head LLVM lowers the generic llvm.clmul.* intrinsic to x86_64.

It covers:

  • scalar i64, i32, and i16
  • vector v2i64, v4i64, v4i32, and v8i32
  • derived high-half and reverse forms built from legal IR patterns

Run:

bash ./run-clmul-validation.sh

Current findings:

  • Scalar i64 lowers directly to one pclmulqdq or vpclmulqdq.
  • Scalar i32 and i16 are widened through the same instruction and then truncated on return, which is ABI-correct and reasonable.
  • v2i64 and v4i64 lower cleanly using the expected lane masks.
  • v4i32 and v8i32 are scalarized into multiple pclmulqdq operations plus shuffles and inserts. That is functionally correct, though noticeably more instruction-heavy.
  • A widened i128 CLMUL followed by >> 64 is recognized well on x86 and lowers to one pclmulqdq plus a lane extract.
  • The bitreverse form that semantically corresponds to CLMULR also lowers well: one pclmulqdq plus a merge of the low/high halves.