The training serving gap
A model trains on features computed in batch but predicts on features computed live. If the two computations differ even slightly, the model sees different inputs in production than it learned from, a bug called training serving skew. A feature store exists to make both paths use one definition.
Two stores, one definition
- An offline store holds large historical feature tables for training, built in batch.
- An online store holds the latest feature values in a fast key value store for low latency lookup at prediction time.
- A single feature definition feeds both, so the value is computed once and read consistently.
Point in time correctness
When building a training set, the store must join each label to feature values as they were at that moment, never using future data. This point in time join prevents leakage where the model accidentally learns from information unavailable at prediction time.
Key idea
A feature store computes each feature from one definition into an offline store for training and an online store for serving, using point in time joins to prevent training serving skew and leakage.