← Lessons

quiz vs the machine

Platinum1800

System Design

Anomaly Detection in Metrics

Catching unusual behavior automatically when fixed thresholds cannot keep up.

6 min read · advanced · beat Platinum to climb

Beyond fixed thresholds

A static threshold works when normal is constant, but many metrics have daily and weekly patterns. Traffic is high at noon and low at night, so a single threshold either misses night problems or screams every afternoon. Anomaly detection learns what normal looks like and flags departures from it.

Common approaches

  • Statistical bands model the expected value with a moving average and standard deviation, then flag points outside a band. Simple and explainable, but weak with strong seasonality.
  • Seasonal decomposition separates a series into trend, repeating seasonal cycles, and residual, then alerts when the residual is unusual. This handles daily and weekly rhythms well.
  • Forecasting models predict the next value and flag a large gap between predicted and actual.

Hard problems

  • Seasonality must be captured or every rush hour looks anomalous.
  • False positives are the main risk, since a too sensitive detector recreates alert fatigue.
  • Drift means normal slowly changes, so models must be retrained or adapt.
  • Explainability matters because a responder must understand why something fired.

Because of these risks, anomaly detection often feeds dashboards and triage rather than directly paging, or it is paired with a symptom based guardrail.

Key idea

Anomaly detection learns a metric normal pattern including seasonality and flags departures, but false positives and drift mean it usually informs triage rather than paging blindly.

Check yourself

Answer to earn rating on the learn ladder.

1. Why can a fixed threshold fail on many real metrics?

2. What does seasonal decomposition do?

3. Why is anomaly detection often used for triage rather than direct paging?