The challenge
Anomaly detection finds rare, unexpected points such as fraud, failures, or intrusions. Anomalies are scarce and often unlabeled, so we usually learn what normal looks like and flag deviations.
Families of methods
- Statistical approaches flag points far from a fitted distribution.
- Distance and density methods flag points in sparse regions, like local outlier factor.
- Isolation methods separate rare points with few random splits.
- Reconstruction methods, such as autoencoders, flag points the model rebuilds poorly.
Threshold and tradeoffs
You score each point, then pick a threshold to declare anomalies. The threshold trades two costs.
- A loose threshold catches more anomalies but raises false alarms.
- A tight threshold reduces noise but misses real events.
- The right balance depends on the cost of each error.
Evaluation
With few labels, accuracy is misleading. Prefer precision and recall on the rare class, or precision at the top scored points.
Key idea
Anomaly detection models normal behavior and flags deviations, with a threshold that balances false alarms against missed events, evaluated by precision and recall on the rare class.