Adaptive Rate Limiting

Adjust the limit automatically from live health signals instead of a fixed hand tuned number.

Beyond a static number

A fixed limit is a guess. Set it too low and you waste capacity in quiet times; set it too high and you melt the system under load. Adaptive rate limiting adjusts the allowed rate in real time based on how the system is actually doing.

Signals it watches

Latency, especially tail percentiles, rising as the system strains.
Error rate, including timeouts and rejections from downstream.
Queue depth or resource saturation like CPU.

When these signals show stress the limiter tightens the allowed rate; when they look healthy it loosens it. The result tracks the true serving capacity, which itself changes with deployments, traffic mix, and hardware.

A control loop view

Adaptive limiting is a feedback control loop, conceptually like additive increase multiplicative decrease: probe upward slowly while healthy, cut sharply at the first sign of overload. The aim is to sit just below the point where latency spikes.

The risks

A noisy signal can cause oscillation, swinging the limit wildly.
Cutting too aggressively can starve legitimate traffic.
It is harder to reason about and test than a fixed number.

Key idea