← Lessons

quiz vs the machine

Platinum1810

System Design

Data Skew Handling

Stopping one hot key from making a single task carry most of the work.

6 min read · advanced · beat Platinum to climb

The symptom

In a distributed job most tasks finish quickly while a few run for ages. This is data skew: one partition or key holds far more data than the others, so its task becomes a straggler that sets the job runtime.

Why it happens

  • A few hot keys dominate, like one popular product in a sales table.
  • Hash partitioning sends all rows for a key to one reducer, so a skewed key cannot be split.
  • Null or default values pile into a single bucket.

Fixes

  • Salting appends a random suffix to the hot key so its rows spread across many tasks, then a second pass re combines the partial results.
  • Isolated handling detects hot keys and processes them with a dedicated strategy, like a broadcast, while normal keys take the usual path.
  • Adaptive execution in modern engines splits oversized partitions at runtime.

The goal is always to break the dominance of one key so work spreads evenly and no single task dictates the runtime.

Key idea

Data skew makes one hot key overload a single task, and fixes like salting and isolated handling spread that work across the cluster.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the main effect of data skew?

2. How does salting help with a hot key?

3. Why does plain hash partitioning struggle with a hot key?