What autoscaling decides
An autoscaling policy decides when to add or remove instances. The goal is to track demand closely enough to stay responsive without paying for idle capacity.
Common policy types
- Target tracking holds a metric near a setpoint, for example keeping average CPU at sixty percent by adding nodes when it climbs.
- Step scaling adds different amounts depending on how far past a threshold the metric is.
- Scheduled scaling changes capacity at known times, like scaling up before a daily peak.
- Predictive scaling forecasts demand from history and provisions ahead of it.
Tuning that matters
- Cooldowns stop the controller from reacting before new instances warm up.
- Warm up time means reactive scaling always lags a sudden spike, so headroom or prediction covers the gap.
- Flapping happens when up and down thresholds sit too close, churning instances.
Key idea
Autoscaling policies trade cost against responsiveness, so combine reactive target tracking with cooldowns and either headroom or prediction to absorb the delay before new capacity is ready.