When the model cannot answer
Model services fail. A GPU runs out of memory, a request times out, or a model returns garbage. Graceful degradation means the system still returns a useful response instead of an error.
Fallback options
- A simpler model that is cheaper and more reliable as a backup.
- A cached or default answer when fresh inference is unavailable.
- A safe rule based response that is always correct if dull.
Timeouts and circuit breakers
A timeout caps how long a request waits before giving up on the model. A circuit breaker notices repeated failures and stops calling the failing model for a while, sending all traffic to the fallback so the system does not pile up stuck requests.
Designing the ladder
- Try the primary model first within a tight timeout.
- On failure or timeout, drop to the fallback.
- Make the degraded path clearly acceptable, not silently wrong.
Key idea
Graceful degradation keeps an ML service useful when the primary model fails by falling back to a simpler model, a cached answer, or a safe rule. Timeouts and circuit breakers trigger the fallback fast so failures never become stuck or cascading.