← Lessons

quiz vs the machine

Platinum1750

System Design

Rate Limiting Per User and Per Tenant

Enforcing fair usage so no single user or tenant starves the others.

5 min read · advanced · beat Platinum to climb

Two levels of fairness

Rate limiting in a shared system needs two scopes. A per user limit stops one account from flooding the service. A per tenant limit caps a whole customer organization, since a tenant may have many users whose combined load could starve others.

A request must pass both checks. A user under their own limit can still be rejected if their tenant is over its cap.

The counting mechanism

A common tool is the token bucket: each key holds a bucket that refills at a steady rate and drains one token per request. When the bucket is empty, requests are rejected or queued. The bucket key is the user id for the user limit and the tenant id for the tenant limit.

Because many servers handle traffic, counters usually live in a shared store so limits hold across the fleet.

Design points

  • Return a clear retry after signal so clients back off.
  • Give larger tenants higher tenant limits by plan tier.
  • Keep the limiter fast; it runs before real work.

Key idea

Per user and per tenant rate limits both gate each request so neither one account nor one customer can starve the shared system.

Check yourself

Answer to earn rating on the learn ladder.

1. Why enforce both a per user and a per tenant limit?

2. In the token bucket mechanism, what happens when the bucket is empty?

3. Why do counters usually live in a shared store?