i486:
Compare and Exchange (CMPXCHG) with LOCK prefix for atomic swaps. Exchange and Add (XADD) with LOCK prefix for atomic swaps.
Pentium II:
Compare and Exchange 8 bytes (CMPXCHG8B) with LOCK prefix for atomic swaps on 8 byte data.
MMX instructions (SIMD):
MMX supports 8 registers (MM0-MM8) that are 64-bit integers that support saturated integer operations. This allowed Pentium processors to do instructions on multiple words at once (since most words were 32-bit at the time).
MMX instructions are prefixed with p, like PADDB
for
add packed byte integers
instead of ADDB
.
SSE instructions (SIMD):
Pentium III:
SSE was introduced as MMX instructions for floating point. SSE added eight 128-bit registers (XMM0-XMM7). In SSE, these are four 32-bit floats.
SSE2 expanded the registers to either be:
Pentium IV:
Pentium IV introduced SSE2, which merged the previous SSE and MMX extensions by allowing SSE registers to be 128-bit integers as well, not just floats.
Sandy Bridge:
Sandy Bridge added AVX instructions (Advanced Vector Extensions) which holds 256-bit registers named (YMM0-YMM7). Each register can hold:
Haswell:
AVX2, which adds vector operations on 256-bit registers.
Skylake:
AVX-512, which adds 512-bit registers and operations on them.
Trailing Bit manipulation, like extracting from the right hand side.
Compare and Swap is ~20 cycles on modern kaby lake hardware.