Offline and Online Evaluation

Two evaluation worlds

Offline evaluation uses logged data. Online evaluation uses live traffic. They answer different questions.

Offline does the model predict well on held out data
Online does the model improve the real business metric with real users

Offline pitfalls

Temporal leakage training on the future and testing on the past
Distribution shift logged data differs from live traffic
Proxy mismatch the offline metric does not track the business goal

Always split by time for systems where the future is what you predict. Random splits leak future information.

Bridge the gap

A model that wins offline can still lose online because of latency, feedback effects, or a metric proxy that did not hold. Use offline as a cheap filter and online as the final judge.

Key idea

Offline evaluation screens candidates cheaply; online evaluation on live traffic is the only verdict that counts.

Offline and Online Evaluation

Two evaluation worlds

Offline pitfalls

Bridge the gap

Key idea

Check yourself