The skew problem
The most common production ML bug is training serving skew: a feature computed one way in training and a different way at serving time. The model then sees inputs it never trained on.
The feature store idea
A feature store computes features once and serves them to both paths.
- Offline store large historical features for training
- Online store low latency lookups for serving
- Shared definitions the same code or logic produces both
Point in time correctness
When building training data, each feature must reflect only what was known at that moment. Joining current values onto past events leaks the future.
Streaming versus batch features
- Batch features computed periodically, such as last 30 day spend
- Streaming features updated in near real time, such as clicks in the last minute
Key idea
Define each feature once and serve it to training and inference from the same logic, with point in time correctness to prevent skew and leakage.