Two levels of fairness
Rate limiting in a shared system needs two scopes. A per user limit stops one account from flooding the service. A per tenant limit caps a whole customer organization, since a tenant may have many users whose combined load could starve others.
A request must pass both checks. A user under their own limit can still be rejected if their tenant is over its cap.
The counting mechanism
A common tool is the token bucket: each key holds a bucket that refills at a steady rate and drains one token per request. When the bucket is empty, requests are rejected or queued. The bucket key is the user id for the user limit and the tenant id for the tenant limit.
Because many servers handle traffic, counters usually live in a shared store so limits hold across the fleet.
Design points
- Return a clear retry after signal so clients back off.
- Give larger tenants higher tenant limits by plan tier.
- Keep the limiter fast; it runs before real work.
Key idea
Per user and per tenant rate limits both gate each request so neither one account nor one customer can starve the shared system.