Autoscale on Custom Metrics

Driving autoscaling from a signal that tracks real load, like queue depth, not just CPU.

Why CPU is not enough

Default autoscaling often watches CPU. But many workloads are bound by something else: a worker waiting on a queue may show low CPU while a huge backlog piles up. Scaling on CPU would never react. Custom metric autoscaling picks a signal that reflects true load.

Good custom signals

Queue depth or backlog age for worker pools.
Requests per instance for web tiers.
Latency against a target for user facing services.

The scaler compares the metric to a target and adds or removes instances to keep it near that target.

Things to get right

Pick a metric that leads or tracks load, not one that lags badly.
Add cooldowns so the system does not flap up and down.
Set sane min and max bounds to cap cost and protect dependencies.

The right metric makes autoscaling respond to the load that actually matters.

Key idea

Autoscale on a metric that reflects real demand, such as queue depth, rather than CPU alone.

Autoscale on Custom Metrics

Why CPU is not enough

Good custom signals

Things to get right

Key idea

Check yourself