Why outliers matter at serving time
A model is only trustworthy on inputs that resemble its training data. An outlier far from that region triggers extrapolation, where predictions are unreliable yet still confident looking. Catching outliers lets the system route them to a safe fallback.
Detection approaches
- Distance based, flag points far from training clusters.
- Density based, methods like isolation forest or local outlier factor isolate rare points.
- Reconstruction based, an autoencoder that reconstructs in distribution data well will reconstruct outliers poorly.
Handling a flagged input
Design choices
- Set the threshold to balance missed outliers against false flags.
- Log flagged inputs, since they often reveal emerging drift or new segments.
- An outlier detector is itself a model and can drift, so monitor it too.
Outliers versus drift
A burst of outliers can be the leading edge of drift before aggregate tests react, making outlier rate a fast complementary signal.
Key idea
Outlier detection flags out of distribution inputs at serving time so the system can fall back safely, and a rising outlier rate is an early signal of drift.