← Lessons

quiz vs the machine

Gold1460

Machine Learning

AB Testing ML Models

Proving a new model actually improves the business with a controlled live experiment.

6 min read · core · beat Gold to climb

The only real proof

Offline gains are a hypothesis. An AB test splits live traffic between the current model and the candidate to measure the true causal effect.

How it works

  • Control users keep the current model
  • Treatment users get the new model
  • Randomize assignment so the groups are comparable
  • Measure the primary business metric over a fixed window

Statistical care

  • Pick the primary metric and sample size before you start
  • Run long enough for significance, avoiding early peeking
  • Watch guardrail metrics so a win on one metric does not hide harm elsewhere

Beware interference

In marketplaces or social graphs, treatment users can affect control users, breaking independence. Use cluster or geo based splits when that risk is real.

Key idea

An AB test is the causal verdict: randomize traffic, pre register the metric and sample size, and guard against peeking and interference.

Check yourself

Answer to earn rating on the learn ladder.

1. Why pick the sample size and primary metric before the test starts?

2. When might you use cluster or geo based splits?