← Lessons

quiz vs the machine

Platinum1850

System Design

Adaptive Concurrency Limits

Let a service discover its own safe in flight limit from latency feedback instead of a fixed guess.

6 min read · advanced · beat Platinum to climb

The problem with fixed limits

A hardcoded concurrency limit is a guess that goes stale. Set it too low and you waste capacity; set it too high and a slow dependency lets queues build until the service falls over.

How adaptive limits work

Adaptive concurrency limits borrow ideas from network congestion control. The service watches its own latency and adjusts the number of allowed in flight requests.

  • Latency stays low means there is spare capacity, so the limit grows.
  • Latency rises signals queuing, so the limit shrinks to drain the backlog.

Why latency is the signal

Latency reflects the real state of the system, including slow downstreams the service cannot see directly. Algorithms like the gradient method compare a recent latency to a long term minimum and back off when the ratio worsens.

Benefits and care

  • Self tuning: the limit tracks changing hardware and dependency health automatically.
  • Pairs with shedding: requests beyond the limit are shed quickly rather than queued.
  • Noisy signals: smoothing prevents the limit from oscillating wildly.

Key idea

Adaptive concurrency limits use latency feedback to find the largest safe number of in flight requests automatically, protecting a service from overload without the brittleness of a fixed hand tuned number.

Check yourself

Answer to earn rating on the learn ladder.

1. What signal do adaptive concurrency limits use to adjust?

2. Why are adaptive limits better than a fixed hardcoded limit?