Why CPU is not enough
Default autoscaling often watches CPU. But many workloads are bound by something else: a worker waiting on a queue may show low CPU while a huge backlog piles up. Scaling on CPU would never react. Custom metric autoscaling picks a signal that reflects true load.
Good custom signals
- Queue depth or backlog age for worker pools.
- Requests per instance for web tiers.
- Latency against a target for user facing services.
The scaler compares the metric to a target and adds or removes instances to keep it near that target.
Things to get right
- Pick a metric that leads or tracks load, not one that lags badly.
- Add cooldowns so the system does not flap up and down.
- Set sane min and max bounds to cap cost and protect dependencies.
The right metric makes autoscaling respond to the load that actually matters.
Key idea
Autoscale on a metric that reflects real demand, such as queue depth, rather than CPU alone.