Communicating limits
A rate limited API protects itself by capping requests per client per window. To let well behaved clients cooperate, the server returns rate limit headers describing the current state of the quota.
The common headers
- Limit states the maximum requests allowed in the window.
- Remaining states how many requests are left right now.
- Reset states when the window resets, as a timestamp or seconds from now.
- When the limit is exceeded the server returns status 429 Too Many Requests, often with a Retry After header telling the client how long to wait.
How clients should react
- Read remaining and slow down before hitting zero rather than blindly retrying.
- On a 429, honor Retry After instead of hammering immediately.
- Add jitter to retries so many clients do not all wake at the same instant and cause a thundering herd.
Key idea
Rate limit headers report the quota limit remaining and reset so cooperative clients can throttle themselves and respect Retry After on a 429.