Why models decay
A model is trained on a snapshot of the world. Over time the live world changes and the model grows stale. Two distinct kinds of change cause this decay.
Data drift
Data drift, also called covariate shift, is a change in the distribution of the inputs. The relationship between inputs and outputs stays the same, but the inputs you now see differ from training.
- A fraud model trained before a new payment method now sees unfamiliar transactions
- The features still predict fraud the same way, but their distribution has shifted
Concept drift
Concept drift is a change in the relationship between inputs and the target itself. The same inputs now map to a different correct answer.
- During a recession, the same income and credit score imply a higher default risk than before
- The inputs may look the same, but the meaning of the label has changed
Detecting and responding
You detect data drift by comparing input distributions over time, often with a statistic such as the population stability index. Concept drift is harder because you need labels to see that predictions have become wrong. The usual response is to retrain on fresh data, sometimes on a regular schedule.
Key idea
Data drift changes the inputs while the rule holds; concept drift changes the rule itself, and both demand retraining.