The leakage trap
When you build a training table, each row pairs a label with the features as they stood at the moment that event happened. If you accidentally attach a feature value computed after the event, you leak the future into training. This is a point in time correctness failure.
A concrete example
- You predict whether a customer churns on day thirty.
- A feature is total lifetime spend.
- If you compute lifetime spend over the full history, it includes purchases made after day thirty, which the model could never see at prediction time.
The correct join
- For every label timestamp, you must look up feature values as of that exact timestamp or earlier.
- This is called a point in time join or an as of join. It walks each feature back to its latest value before the event.
Why it matters
- Without it, offline metrics are wildly optimistic and collapse in production.
- It is one of the most common and damaging data bugs in real ML systems.
How feature stores help
- They version every feature value with a timestamp and provide as of joins by default, so training rows respect the timeline automatically.
Key idea
Point in time correctness means each training row uses only feature values known before its event, enforced with as of joins to avoid leaking the future.