← Lessons

quiz vs the machine

Gold1380

Databases

The Redshift MPP Engine

How massively parallel processing spreads a query across slices.

5 min read · core · beat Gold to climb

Massively Parallel Processing

Redshift is an MPP warehouse. A leader node parses each query and builds a plan. Compute nodes execute that plan in parallel, and each compute node is divided into slices that each own a chunk of the data and a share of CPU and memory.

How Data Is Spread

Rows are distributed across slices by a distribution style. Even spreading keeps every slice busy. A query runs the same steps on each slice over its local data, then the leader gathers the pieces.

Why Distribution Matters

  • Skew happens when one slice holds far more rows, becoming a straggler.
  • Co location joins faster when matching rows share a slice, avoiding network shuffles.
  • A good distribution key balances data and keeps joins local.

Key idea

Redshift distributes data across compute slices and runs each query step in parallel, so even distribution and join co location are critical to avoid stragglers and network shuffles.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the role of the leader node in Redshift?

2. What problem does data skew cause across slices?