← Lessons

quiz vs the machine

Gold1390

System Design

Client Side Rate Limiting

Throttle outbound requests at the source so you never blow past a dependency's quota.

4 min read · core · beat Gold to climb

Limiting from the caller side

Rate limiting is usually framed as server protection, but the client benefits from limiting its own outbound rate too. A service calling a third party API with a strict quota should pace itself so it never gets a wall of 429 responses that waste work and trip alarms.

How clients do it

  • Keep a local token bucket sized to the dependency quota and acquire a token before each outbound call.
  • Add a concurrency cap so only so many calls are in flight at once.
  • Queue or shed work when the local budget is exhausted, rather than firing and being rejected.

Why it pays off

  • It avoids wasted requests that the server would reject anyway.
  • It smooths load on shared dependencies, keeping a good neighbor relationship.
  • It keeps the client predictable, since work is paced rather than bursting and stalling.

Client side limiting and server side limiting are complementary: the server defends itself, the client respects the contract before the server has to enforce it.

Key idea

Client side rate limiting paces outbound calls at the source so you stay within a dependency quota and avoid wasted rejected requests.

Check yourself

Answer to earn rating on the learn ladder.

1. Why limit requests on the client side?

2. How does a client commonly enforce its own rate?