The Concurrency Limiter

Cap requests in flight at once, not per second, to protect finite resources like threads.

A different dimension

Most limiters cap rate: requests per second. A concurrency limiter caps something else: how many requests are being processed at the same time. This matters because some resources are bounded by simultaneous use, not by throughput, such as database connections, threads, or memory per in flight request.

How it works

The limiter holds a fixed number of permits, often via a semaphore.
A request acquires a permit when it starts and releases it when it finishes.
If no permit is free, the request waits or is rejected.

Because a permit is held for the full duration of the work, slow requests consume capacity longer. Ten slow requests can saturate the same limit that a thousand fast ones would barely touch.

Why rate alone is not enough

A rate limit cannot stop ten thousand simultaneous slow requests if they each arrive within budget. A concurrency cap directly bounds the in flight count, protecting connection pools and memory from exhaustion.

Key idea

A concurrency limiter caps the number of requests in flight at once, protecting finite resources that a rate limit cannot bound.

The Concurrency Limiter

A different dimension

How it works

Why rate alone is not enough

Key idea

Check yourself