← Lessons

quiz vs the machine

Gold1480

System Design

The Concurrency Limiter

Cap requests in flight at once, not per second, to protect finite resources like threads.

4 min read · core · beat Gold to climb

A different dimension

Most limiters cap rate: requests per second. A concurrency limiter caps something else: how many requests are being processed at the same time. This matters because some resources are bounded by simultaneous use, not by throughput, such as database connections, threads, or memory per in flight request.

How it works

  • The limiter holds a fixed number of permits, often via a semaphore.
  • A request acquires a permit when it starts and releases it when it finishes.
  • If no permit is free, the request waits or is rejected.

Because a permit is held for the full duration of the work, slow requests consume capacity longer. Ten slow requests can saturate the same limit that a thousand fast ones would barely touch.

Why rate alone is not enough

A rate limit cannot stop ten thousand simultaneous slow requests if they each arrive within budget. A concurrency cap directly bounds the in flight count, protecting connection pools and memory from exhaustion.

Key idea

A concurrency limiter caps the number of requests in flight at once, protecting finite resources that a rate limit cannot bound.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a concurrency limiter cap?

2. Why is a rate limit not enough on its own?