The Adaptive Concurrency Limit

Letting a client learn the right in flight limit from latency instead of a fixed guess.

A limit that tunes itself

A fixed concurrency limit is a guess that goes stale as capacity changes. An adaptive concurrency limit adjusts the number of in flight requests automatically by watching latency, much like TCP congestion control adjusts its window.

How it senses overload

The idea rests on Little law: when a downstream service saturates, added concurrency stops raising throughput and only inflates latency. So rising latency at a steady throughput is the signal of overload.

Track a baseline minimum latency, the round trip when the service is unloaded.
Compare current latency to that baseline. A growing queueing delay means requests are piling up.
Use a gradient: if measured latency is close to the minimum, raise the limit; if it climbs, lower it.

Additive increase and multiplicative decrease

Like congestion control, a robust scheme increases the limit gently and cuts it sharply on a sign of trouble. This additive increase multiplicative decrease rhythm probes for more capacity but retreats fast when latency spikes or errors appear.

Key idea

An adaptive limit reads queueing delay from latency versus a baseline and grows or shrinks in flight requests, finding capacity without a brittle fixed guess.

The Adaptive Concurrency Limit

A limit that tunes itself

How it senses overload

Additive increase and multiplicative decrease

Key idea

Check yourself