← Lessons

quiz vs the machine

Silver1110

Machine Learning

Offline vs Online Evaluation

Why a model that scores well on a test set can still fail in production.

4 min read · intro · beat Silver to climb

Two kinds of evaluation

Offline evaluation measures a model on a fixed dataset before deployment. You compute metrics such as accuracy or area under the curve on a held out test set. It is fast, cheap, and repeatable.

Online evaluation measures the model on live traffic after deployment. You watch business metrics such as click rate or revenue while real users interact with it.

Why offline is not enough

A great offline score does not guarantee a good online result.

  • Offline data can differ from live traffic, so the score is optimistic
  • Offline metrics like accuracy may not match the goal, which might be revenue or retention
  • A model can change user behavior, which offline data never captures

How they work together

Offline evaluation is a filter. It rejects bad candidates cheaply so only promising models reach live traffic. Online evaluation is the final judge because it measures what actually matters on real users, usually through an experiment.

Key idea

Offline evaluation cheaply filters candidates, but only online evaluation on real traffic proves business value.

Check yourself

Answer to earn rating on the learn ladder.

1. What does online evaluation measure that offline cannot?

2. What role does offline evaluation play in the pipeline?