← Lessons

quiz vs the machine

Platinum1750

Concurrency

The Adaptive Concurrency Limit

Letting a client learn the right in flight limit from latency instead of a fixed guess.

6 min read · advanced · beat Platinum to climb

A limit that tunes itself

A fixed concurrency limit is a guess that goes stale as capacity changes. An adaptive concurrency limit adjusts the number of in flight requests automatically by watching latency, much like TCP congestion control adjusts its window.

How it senses overload

The idea rests on Little law: when a downstream service saturates, added concurrency stops raising throughput and only inflates latency. So rising latency at a steady throughput is the signal of overload.

  • Track a baseline minimum latency, the round trip when the service is unloaded.
  • Compare current latency to that baseline. A growing queueing delay means requests are piling up.
  • Use a gradient: if measured latency is close to the minimum, raise the limit; if it climbs, lower it.

Additive increase and multiplicative decrease

Like congestion control, a robust scheme increases the limit gently and cuts it sharply on a sign of trouble. This additive increase multiplicative decrease rhythm probes for more capacity but retreats fast when latency spikes or errors appear.

Key idea

An adaptive limit reads queueing delay from latency versus a baseline and grows or shrinks in flight requests, finding capacity without a brittle fixed guess.

Check yourself

Answer to earn rating on the learn ladder.

1. What signal tells an adaptive limiter the service is overloaded?

2. Why use additive increase and multiplicative decrease?

3. What does the limiter compare current latency against?