The Capacity Buffer

Running with deliberate headroom so spikes and failures are absorbed without tipping over.

Why run with slack

It is tempting to size a system to its average load and run near one hundred percent utilization. But real systems face spikes, failovers, and autoscaling delays. A capacity buffer is deliberate spare headroom that absorbs these surprises.

What the buffer protects against

A traffic spike that arrives before autoscaling can react.
A failed instance whose load must shift to survivors.
A deploy that briefly removes capacity.

If a system already runs near its limit, any of these can push it into overload and a cascading failure.

Sizing the buffer

Reserve enough so that losing one instance or zone does not exceed safe utilization.
Account for autoscaling lag, the time to detect and add capacity.
Watch tail latency, which rises sharply as utilization approaches the limit.

The buffer costs money but buys stability, turning would be incidents into non events.

Key idea

Keep deliberate spare capacity so spikes and failures are absorbed before they cause overload.

The Capacity Buffer

Why run with slack

What the buffer protects against

Sizing the buffer

Key idea

Check yourself