What it is
Training serving skew is a difference between how features are produced during training and how they are produced during serving. The model learns on one version of a feature but sees a slightly different version in production, so it underperforms.
Where it comes from
- Code skew when training uses one library or query and serving uses a different one
- Data skew when training reads a clean batch table but serving reads a noisier live source
- Time skew when a training feature accidentally uses information that is not available at serving time
That last case is a hidden form of leakage that makes offline scores look great and online results disappoint.
Why it is dangerous
The skew is silent. Offline metrics still look fine because they use the training pipeline. The gap only shows up as worse than expected live performance, which is hard to trace.
How to prevent it
The strongest fix is to share one feature computation between training and serving, which is exactly what a feature store provides. Logging the actual features served and replaying them for training also keeps the two paths aligned.
Key idea
Training serving skew comes from computing features two ways; sharing one feature path is the durable fix.