← Lessons

quiz vs the machine

Gold1350

Machine Learning

The Alerting Thresholds Ml

Setting alarm levels that catch real regressions without drowning in false pages.

4 min read · core · beat Gold to climb

The alerting balance

An alert threshold decides when a metric breach pages a human. Set it too tight and the team drowns in false alarms until they ignore them. Set it too loose and real failures slip through. Good thresholds sit between noise and blindness.

Ways to set a threshold

  • Static, a fixed line such as accuracy below ninety percent.
  • Relative, a drop of more than a set percent from baseline.
  • Statistical, alert when a metric leaves a band of several standard deviations.

Reducing false pages

  • Require a breach to persist over a window before alerting.
  • Add severity tiers, a warning to a dashboard versus a page for on call.
  • Account for seasonality so normal weekend dips do not fire alarms.

Alert fatigue

The biggest risk is fatigue, where too many low value alerts train people to dismiss them, so a real one gets ignored. Every alert should be actionable.

Key idea

Alerting thresholds must balance false alarms against missed failures using static, relative, or statistical bands, with persistence and severity tiers to fight alert fatigue.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the danger of setting alert thresholds too tight?

2. How does requiring a breach to persist help?