← Lessons

quiz vs the machine

Gold1370

System Design

The Capacity Buffer

Running with deliberate headroom so spikes and failures are absorbed without tipping over.

4 min read · core · beat Gold to climb

Why run with slack

It is tempting to size a system to its average load and run near one hundred percent utilization. But real systems face spikes, failovers, and autoscaling delays. A capacity buffer is deliberate spare headroom that absorbs these surprises.

What the buffer protects against

  • A traffic spike that arrives before autoscaling can react.
  • A failed instance whose load must shift to survivors.
  • A deploy that briefly removes capacity.

If a system already runs near its limit, any of these can push it into overload and a cascading failure.

Sizing the buffer

  • Reserve enough so that losing one instance or zone does not exceed safe utilization.
  • Account for autoscaling lag, the time to detect and add capacity.
  • Watch tail latency, which rises sharply as utilization approaches the limit.

The buffer costs money but buys stability, turning would be incidents into non events.

Key idea

Keep deliberate spare capacity so spikes and failures are absorbed before they cause overload.

Check yourself

Answer to earn rating on the learn ladder.

1. What is a capacity buffer?

2. Why is running near full utilization risky?

3. What should the buffer account for besides spikes?