← Lessons

quiz vs the machine

Platinum1760

Concurrency

SIMD Vectorization in Depth

Processing multiple data elements per instruction using wide vector registers and lanes.

6 min read · advanced · beat Platinum to climb

One instruction, many elements

SIMD stands for single instruction multiple data. A SIMD instruction operates on a wide vector register holding several elements at once, called lanes. Instead of adding two numbers, one add instruction adds, say, eight pairs in parallel within a single core.

How vectorization happens

Turning a scalar loop into SIMD is vectorization. A compiler can do it automatically when the loop is simple and independent, or a programmer can write it explicitly with intrinsics.

  • Operate on a chunk of lanes per iteration.
  • Use a remainder loop for the leftover elements that do not fill a full vector.
  • Keep iterations independent so lanes do not depend on each other.

What blocks vectorization

Several things stop a loop from vectorizing:

  • Data dependencies where one iteration needs the previous result.
  • Branches inside the loop, though masking can sometimes handle them by computing all lanes and selecting results.
  • Misaligned or non contiguous data that the vector load cannot pack efficiently.

Effective SIMD also needs aligned, contiguous data so a full vector loads in one step.

Key idea

SIMD vectorization packs several data elements into vector lanes so one instruction processes them together, but it only works when iterations are independent and data is contiguous and aligned.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a single SIMD instruction do?

2. Which condition blocks automatic vectorization?

3. What is the remainder loop for?