Guarding Capacity
Rate limiting caps how many requests a service accepts in a window, protecting it from overload, abuse, and noisy neighbors. The mesh enforces limits at the proxy, before traffic reaches your app.
Local vs Global
- Local rate limiting runs entirely in each proxy. It is fast and needs no coordination, but each proxy counts independently.
- Global rate limiting consults a shared service so a limit applies across every replica together.
Local limits suit per instance protection. Global limits suit a true fleet wide quota, like one thousand requests per second per customer no matter which pod serves them.
The Algorithm
Most proxies use a token bucket. Tokens refill at a steady rate, each request spends one, and an empty bucket means requests are rejected with a too many requests status. This allows short bursts while bounding the sustained rate.
Why at the Mesh
Putting limits in the proxy means rejected traffic never touches application threads, and the policy is uniform across services. The app stays simple while the platform enforces fairness.
Key idea
The mesh enforces rate limits at the proxy using token buckets, with local limits per instance and global limits coordinated across replicas, so overload is rejected before it reaches your app.