← Lessons

quiz vs the machine

Gold1380

Machine Learning

Offline and Online Evaluation

Why a strong offline score is necessary but never sufficient before shipping.

5 min read · core · beat Gold to climb

Two evaluation worlds

Offline evaluation uses logged data. Online evaluation uses live traffic. They answer different questions.

  • Offline does the model predict well on held out data
  • Online does the model improve the real business metric with real users

Offline pitfalls

  • Temporal leakage training on the future and testing on the past
  • Distribution shift logged data differs from live traffic
  • Proxy mismatch the offline metric does not track the business goal

Always split by time for systems where the future is what you predict. Random splits leak future information.

Bridge the gap

A model that wins offline can still lose online because of latency, feedback effects, or a metric proxy that did not hold. Use offline as a cheap filter and online as the final judge.

Key idea

Offline evaluation screens candidates cheaply; online evaluation on live traffic is the only verdict that counts.

Check yourself

Answer to earn rating on the learn ladder.

1. Why split time series style data by time rather than randomly?

2. Why can a model that wins offline still lose online?