SIMD Vectorization Basics
SIMD stands for single instruction multiple data. A SIMD instruction applies one operation to several values at once, packed side by side in a wide register. Where a scalar add handles one pair of numbers, a SIMD add might handle eight pairs in the same instruction.
This is parallelism inside a single core. The values sit in lanes of a vector register, and the hardware processes all lanes together. Turning scalar loops into vector instructions is called vectorization, and modern compilers attempt it automatically when the loop is simple enough.
- Lanes Independent slots in a wide register, processed in parallel.
- Vectorization Rewriting scalar loops to use vector instructions.
- Same op Every lane runs the identical operation, so it suits data parallel work.
Vectorization shines on tight numeric loops over arrays, such as scaling pixels or summing floats. The data must be regular and ideally contiguous in memory so it can be loaded into a vector register cheaply.
Branches and irregular access hurt SIMD. If different lanes need different code paths, the hardware must compute both sides and mask out the unwanted lanes, wasting effort. Misaligned or scattered data forces slow gather loads. Good vectorized code keeps lanes doing the same work on neighboring data.
Key idea
SIMD runs one instruction across many lanes of a wide register, giving in core data parallelism that thrives on regular contiguous loops and suffers from branches.