← Lessons

quiz vs the machine

Gold1440

System Design

Autoscale on Custom Metrics

Driving autoscaling from a signal that tracks real load, like queue depth, not just CPU.

4 min read · core · beat Gold to climb

Why CPU is not enough

Default autoscaling often watches CPU. But many workloads are bound by something else: a worker waiting on a queue may show low CPU while a huge backlog piles up. Scaling on CPU would never react. Custom metric autoscaling picks a signal that reflects true load.

Good custom signals

  • Queue depth or backlog age for worker pools.
  • Requests per instance for web tiers.
  • Latency against a target for user facing services.

The scaler compares the metric to a target and adds or removes instances to keep it near that target.

Things to get right

  • Pick a metric that leads or tracks load, not one that lags badly.
  • Add cooldowns so the system does not flap up and down.
  • Set sane min and max bounds to cap cost and protect dependencies.

The right metric makes autoscaling respond to the load that actually matters.

Key idea

Autoscale on a metric that reflects real demand, such as queue depth, rather than CPU alone.

Check yourself

Answer to earn rating on the learn ladder.

1. Why can CPU based autoscaling miss real load?

2. What prevents an autoscaler from flapping?