← Lessons

quiz vs the machine

Platinum1780

System Design

Capacity Headroom Planning

Deciding how much slack to keep so a surge or failure does not immediately become an outage.

5 min read · advanced · beat Platinum to climb

Slack on purpose

Headroom is the gap between current usage and full capacity, kept deliberately so the system absorbs surges and failures without falling over.

Why full is dangerous

  • Latency rises sharply as utilization approaches 100 percent, due to queueing.
  • A failed instance dumps its load onto the survivors, who must have room to take it.
  • Sudden spikes need slack to absorb before autoscaling can react.

A common target keeps steady utilization around 50 to 70 percent so there is room for both failure and growth.

Planning the buffer

  • Size for peak plus failure: enough that losing a node still leaves capacity.
  • Account for scaling lag: new instances take time to warm up.

Running hot to save money feels efficient until one failure cascades, because the survivors had no room to absorb the lost node's traffic.

Key idea

Headroom is deliberate slack below full capacity that absorbs spikes, failures, and scaling lag before they become outages.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does latency rise near 100 percent utilization?

2. What must headroom cover beyond normal peak?