← Lessons

quiz vs the machine

Gold1440

System Design

The Thundering Herd Problem

When many clients wake at once and stampede a recovering resource.

5 min read · core · beat Gold to climb

A synchronized stampede

The thundering herd problem happens when many clients are waiting on the same event and all wake up and act at the exact same moment, swamping the resource they were waiting for. The surge can knock over a service just as it tries to recover.

Common triggers

  • A cache entry expires and every request misses simultaneously, all hitting the database at once.
  • A service comes back after an outage and every client reconnects in the same instant.
  • A timer fires across many clients on the same schedule.

How to tame it

  • Jitter: add randomness to timeouts and retry delays so clients spread out rather than synchronizing.
  • Request coalescing: let one request rebuild the cache while others wait.
  • Exponential backoff: widen retry gaps so a recovering service is not hit by a wall.

The unifying fix is to break the synchronization that causes everyone to act in lockstep.

Key idea

The thundering herd is a synchronized stampede, and the cure is jitter, backoff, and coalescing to spread the load over time.

Check yourself

Answer to earn rating on the learn ladder.

1. What causes a thundering herd?

2. Why does adding jitter help?