Two kinds of metrics
Every ML project has two metric families that must stay aligned.
- Model metrics like F1, AUC, or RMSE, measured offline on held out data
- Business metrics like revenue, retention, churn, or cost saved, measured in production
A jump in offline AUC is worthless if it does not move a business metric.
The alignment gap
These can diverge for many reasons.
- The label is a proxy that imperfectly reflects the real goal
- A better ranker may improve clicks but not purchases
- Gains concentrate on segments that do not matter commercially
Bridging the gap
- Trace a clear causal story from model score to business outcome
- Validate offline gains with an online AB test on the business metric
- Set guardrail metrics so an improvement does not quietly hurt latency, cost, or fairness
Why it matters
Optimizing a model metric in isolation can win the leaderboard and lose the product. The model metric is a means, the business metric is the end.
Key idea
Offline model metrics only matter if they move a real business metric. Connect them with a causal argument and confirm with an online test before trusting the gain.