Two thresholds
A hard limit is an absolute ceiling: cross it and requests are rejected outright, usually with a too many requests status. It protects the system from overload no matter what.
A soft limit is a lower warning threshold. Crossing it does not block the caller; instead the system degrades gracefully or signals the client. It buys time to react before the hard wall is hit.
What a soft limit can trigger
- Emit a warning header or log so the client and operators see pressure building.
- Shed non essential work like background prefetching or analytics.
- Lower the priority of the caller rather than dropping them.
- Notify the account owner that they are approaching their quota.
Why use both
The hard limit guarantees protection. The soft limit creates a buffer zone where the system bends before it breaks, giving clients a chance to slow down voluntarily and avoid an abrupt failure.
Key idea
A soft limit warns and degrades to create a buffer, while a hard limit is the absolute ceiling that blocks requests.