← Lessons

quiz vs the machine

Gold1390

System Design

The Alert Fatigue Problem

Why too many alerts make people ignore the one that finally matters.

4 min read · core · beat Gold to climb

When alerts stop working

Alert fatigue happens when responders receive so many alerts, especially false or low value ones, that they stop trusting and acting on them. A noisy pager is worse than a quiet one, because the real incident hides in the noise.

What drives it

  • Non actionable alerts that no human can do anything about.
  • Flapping, where a metric crosses and recrosses a threshold rapidly.
  • Duplicate pages for the same root cause across many components.
  • Thresholds set too tight, firing on normal variation.

How to fix it

  • Make every alert actionable, with a clear playbook step. If there is no action, it should not page.
  • Tie alerts to symptoms and SLOs so they fire on real user impact.
  • Add hysteresis and for durations so a brief blip does not page.
  • Group and deduplicate related alerts into one notification.
  • Route by severity, sending low priority issues to a queue and only paging for urgent ones.
  • Review regularly and delete alerts that never lead to action.

Key idea

Alert fatigue erodes trust when alerts are noisy or non actionable, so page only on actionable, symptom based, deduplicated signals.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is a noisy pager dangerous?

2. What is a key rule to reduce alert fatigue?