The Worker Pool Scaling

Sizing the Workforce

Throughput equals workers times jobs each can finish per second. When the queue grows faster than workers drain it, latency climbs. The fix is more capacity, added two ways.

Concurrency and Replicas

Concurrency runs several jobs inside one worker process, ideal when work waits on input and output.
Replicas add more worker processes or machines, needed when work is processor bound.

For tasks that spend time waiting on network or disk, raise concurrency cheaply. For heavy computation, add replicas because one core handles one busy job.

Autoscaling Signals

Queue depth is the clearest signal. Scale up when backlog rises.
Oldest job age captures latency directly and is harder to game.
Processor or memory use suits compute bound pools.

Scale up fast to clear backlog, scale down slowly to avoid flapping.

Backpressure

If workers cannot keep up and the queue grows without bound, you risk memory exhaustion. Apply backpressure by rejecting or slowing producers, or by shedding low value jobs.

Key idea