Offline vs Online Evaluation

Two kinds of evaluation

Offline evaluation measures a model on a fixed dataset before deployment. You compute metrics such as accuracy or area under the curve on a held out test set. It is fast, cheap, and repeatable.

Online evaluation measures the model on live traffic after deployment. You watch business metrics such as click rate or revenue while real users interact with it.

Why offline is not enough

A great offline score does not guarantee a good online result.

Offline data can differ from live traffic, so the score is optimistic
Offline metrics like accuracy may not match the goal, which might be revenue or retention
A model can change user behavior, which offline data never captures

How they work together

Offline evaluation is a filter. It rejects bad candidates cheaply so only promising models reach live traffic. Online evaluation is the final judge because it measures what actually matters on real users, usually through an experiment.

Key idea

Offline evaluation cheaply filters candidates, but only online evaluation on real traffic proves business value.

Offline vs Online Evaluation

Two kinds of evaluation

Why offline is not enough

How they work together

Key idea

Check yourself