← Lessons

quiz vs the machine

Platinum1750

Machine Learning

Monitoring Data Drift

Detecting when the input distribution moves away from training data.

6 min read · advanced · beat Platinum to climb

What data drift is

Data drift, also called covariate drift, is a change in the distribution of model inputs over time relative to the training data. The relationship between inputs and labels may stay the same, but the inputs the model now sees differ, which can quietly erode performance.

How to detect it

  • Compare the current input distribution to a reference window, feature by feature.
  • Use a distance or test statistic such as population stability index, Kullback Leibler divergence, or a Kolmogorov Smirnov test.
  • Alert when a feature's drift score crosses a tuned threshold.

Why labels are not required

Data drift watches inputs only, so it works immediately without waiting for ground truth labels, which often arrive late or never. This makes it the earliest available warning signal.

Reading the signal

Drift is a warning, not a verdict. Some drift is harmless and some breaks the model. Pair drift alerts with performance monitoring to decide whether retraining is warranted, and watch for seasonal patterns that look like drift but are expected.

Key idea

Data drift detection compares live input distributions to a training reference using statistical distances, giving an early label free warning that inputs have shifted.

Check yourself

Answer to earn rating on the learn ladder.

1. What does data drift specifically measure?

2. Why can data drift be detected without ground truth labels?

3. How should a drift alert be interpreted?