Why limits need to be visible
A rate limit caps how many requests a client may send in a window. Enforcing it silently is hostile, since a well behaved client cannot tell it is near the edge until it is blocked. Response headers make the budget observable.
The standard signals
- A limit header states the cap for the window.
- A remaining header states how many calls are left.
- A reset header states when the window refills, as a timestamp or seconds.
- When the limit is hit, the server returns 429 Too Many Requests and a Retry After header.
Cooperating with the client
A good client reads remaining and slows down before reaching zero. When it does receive a 429, it must honor Retry After rather than hammering immediately, ideally adding jitter so many clients do not retry in sync.
Key idea
Expose limit, remaining, and reset headers, return 429 with Retry After, and have clients back off with jitter.