The Outlier Detection In Production

Flagging inputs unlike anything the model saw in training before it guesses badly.

Why outliers matter at serving time

A model is only trustworthy on inputs that resemble its training data. An outlier far from that region triggers extrapolation, where predictions are unreliable yet still confident looking. Catching outliers lets the system route them to a safe fallback.

Detection approaches

Distance based, flag points far from training clusters.
Density based, methods like isolation forest or local outlier factor isolate rare points.
Reconstruction based, an autoencoder that reconstructs in distribution data well will reconstruct outliers poorly.

Handling a flagged input

Design choices

Set the threshold to balance missed outliers against false flags.
Log flagged inputs, since they often reveal emerging drift or new segments.
An outlier detector is itself a model and can drift, so monitor it too.

Outliers versus drift

A burst of outliers can be the leading edge of drift before aggregate tests react, making outlier rate a fast complementary signal.

Key idea

Outlier detection flags out of distribution inputs at serving time so the system can fall back safely, and a rising outlier rate is an early signal of drift.