Massively Parallel Processing
Redshift is an MPP warehouse. A leader node parses each query and builds a plan. Compute nodes execute that plan in parallel, and each compute node is divided into slices that each own a chunk of the data and a share of CPU and memory.
How Data Is Spread
Rows are distributed across slices by a distribution style. Even spreading keeps every slice busy. A query runs the same steps on each slice over its local data, then the leader gathers the pieces.
Why Distribution Matters
- Skew happens when one slice holds far more rows, becoming a straggler.
- Co location joins faster when matching rows share a slice, avoiding network shuffles.
- A good distribution key balances data and keeps joins local.
Key idea
Redshift distributes data across compute slices and runs each query step in parallel, so even distribution and join co location are critical to avoid stragglers and network shuffles.