Point in Time Correctness
When you build a training set you join features to labeled events. Point in time correctness means each feature value reflects only what was knowable at the moment of that event, not later.
The leakage trap
Imagine predicting whether a loan defaults. If you join the customer's account balance as it is today, you leak future information, because today's balance was shaped by the very default you are trying to predict. The model looks brilliant in testing and fails in production.
Doing the join right
A correct join uses as of logic:
- For each labeled event with a timestamp, pull the feature value as it stood just before that timestamp.
- Never let a feature reflect data that arrived after the event.
This is harder than a normal join because it requires the full history of each feature, not just its current value. Feature stores and time aware joins exist largely to make this correct by construction.
Why it is sneaky
Leakage from time travel does not throw an error. It silently inflates offline metrics, so disciplined point in time joins are a core defense.
Key idea
Point in time correctness joins each feature as it existed before the event, preventing future information from leaking into training.