Why run with slack
It is tempting to size a system to its average load and run near one hundred percent utilization. But real systems face spikes, failovers, and autoscaling delays. A capacity buffer is deliberate spare headroom that absorbs these surprises.
What the buffer protects against
- A traffic spike that arrives before autoscaling can react.
- A failed instance whose load must shift to survivors.
- A deploy that briefly removes capacity.
If a system already runs near its limit, any of these can push it into overload and a cascading failure.
Sizing the buffer
- Reserve enough so that losing one instance or zone does not exceed safe utilization.
- Account for autoscaling lag, the time to detect and add capacity.
- Watch tail latency, which rises sharply as utilization approaches the limit.
The buffer costs money but buys stability, turning would be incidents into non events.
Key idea
Keep deliberate spare capacity so spikes and failures are absorbed before they cause overload.