← Lessons

quiz vs the machine

Gold1300

System Design

API Rate Limit Response Headers

How an API tells clients their quota, what is left, and when to try again politely.

4 min read · core · beat Gold to climb

Communicating limits

A rate limited API protects itself by capping requests per client per window. To let well behaved clients cooperate, the server returns rate limit headers describing the current state of the quota.

The common headers

  • Limit states the maximum requests allowed in the window.
  • Remaining states how many requests are left right now.
  • Reset states when the window resets, as a timestamp or seconds from now.
  • When the limit is exceeded the server returns status 429 Too Many Requests, often with a Retry After header telling the client how long to wait.

How clients should react

  • Read remaining and slow down before hitting zero rather than blindly retrying.
  • On a 429, honor Retry After instead of hammering immediately.
  • Add jitter to retries so many clients do not all wake at the same instant and cause a thundering herd.

Key idea

Rate limit headers report the quota limit remaining and reset so cooperative clients can throttle themselves and respect Retry After on a 429.

Check yourself

Answer to earn rating on the learn ladder.

1. What status code signals that a rate limit was exceeded?

2. What should a client do when it sees a Retry After header?