The Redshift MPP Engine

Massively Parallel Processing

Redshift is an MPP warehouse. A leader node parses each query and builds a plan. Compute nodes execute that plan in parallel, and each compute node is divided into slices that each own a chunk of the data and a share of CPU and memory.

How Data Is Spread

Rows are distributed across slices by a distribution style. Even spreading keeps every slice busy. A query runs the same steps on each slice over its local data, then the leader gathers the pieces.

Why Distribution Matters

Skew happens when one slice holds far more rows, becoming a straggler.
Co location joins faster when matching rows share a slice, avoiding network shuffles.
A good distribution key balances data and keeps joins local.

Key idea

Redshift distributes data across compute slices and runs each query step in parallel, so even distribution and join co location are critical to avoid stragglers and network shuffles.

The Redshift MPP Engine

Massively Parallel Processing

How Data Is Spread

Why Distribution Matters

Key idea

Check yourself