LLVM CLMUL Validation

This directory checks how current-head LLVM lowers the generic llvm.clmul.* intrinsic to x86_64.

It covers:

Run:

bash ./run-clmul-validation.sh

Current findings:

Scalar i64 lowers directly to one pclmulqdq or vpclmulqdq.
Scalar i32 and i16 are widened through the same instruction and then truncated on return, which is ABI-correct and reasonable.
v2i64 and v4i64 lower cleanly using the expected lane masks.
v4i32 and v8i32 are scalarized into multiple pclmulqdq operations plus shuffles and inserts. That is functionally correct, though noticeably more instruction-heavy.
A widened i128 CLMUL followed by >> 64 is recognized well on x86 and lowers to one pclmulqdq plus a lane extract.
The bitreverse form that semantically corresponds to CLMULR also lowers well: one pclmulqdq plus a merge of the low/high halves.

Takashi's Notes