Slack on purpose
Headroom is the gap between current usage and full capacity, kept deliberately so the system absorbs surges and failures without falling over.
Why full is dangerous
- Latency rises sharply as utilization approaches 100 percent, due to queueing.
- A failed instance dumps its load onto the survivors, who must have room to take it.
- Sudden spikes need slack to absorb before autoscaling can react.
A common target keeps steady utilization around 50 to 70 percent so there is room for both failure and growth.
Planning the buffer
- Size for peak plus failure: enough that losing a node still leaves capacity.
- Account for scaling lag: new instances take time to warm up.
Running hot to save money feels efficient until one failure cascades, because the survivors had no room to absorb the lost node's traffic.
Key idea
Headroom is deliberate slack below full capacity that absorbs spikes, failures, and scaling lag before they become outages.