← Lessons

quiz vs the machine

Gold1410

System Design

The Worker Pool Scaling

Match worker capacity to queue depth using concurrency and autoscaling.

5 min read · core · beat Gold to climb

Sizing the Workforce

Throughput equals workers times jobs each can finish per second. When the queue grows faster than workers drain it, latency climbs. The fix is more capacity, added two ways.

Concurrency and Replicas

  • Concurrency runs several jobs inside one worker process, ideal when work waits on input and output.
  • Replicas add more worker processes or machines, needed when work is processor bound.

For tasks that spend time waiting on network or disk, raise concurrency cheaply. For heavy computation, add replicas because one core handles one busy job.

Autoscaling Signals

  • Queue depth is the clearest signal. Scale up when backlog rises.
  • Oldest job age captures latency directly and is harder to game.
  • Processor or memory use suits compute bound pools.

Scale up fast to clear backlog, scale down slowly to avoid flapping.

Backpressure

If workers cannot keep up and the queue grows without bound, you risk memory exhaustion. Apply backpressure by rejecting or slowing producers, or by shedding low value jobs.

Key idea

Scale workers by concurrency for input output bound jobs and by replicas for compute bound jobs, driven by queue depth or job age.

Check yourself

Answer to earn rating on the learn ladder.

1. Which scaling method best fits jobs that mostly wait on network or disk?

2. Why is oldest job age a strong autoscaling signal?

3. What is a sound autoscaling policy to avoid flapping?