← Lessons

quiz vs the machine

Gold1420

Machine Learning

A B Testing Models Online

Split users into groups and use statistics to decide which model truly wins.

5 min read · core · beat Gold to climb

What it is

An A B test randomly splits users into groups. The control group sees the current model and the treatment group sees the new model. You then compare a business metric between the groups to decide which model is better.

Why randomization matters

Random assignment makes the groups statistically similar, so any difference in the metric can be attributed to the model rather than to who happened to use it. Without randomization, a confounding factor like time of day could fool you.

Reading the result

You do not just compare raw averages, because noise can produce a difference by chance.

  • You compute the difference and its statistical significance, often a p value
  • You decide ahead of time the sample size needed to detect a meaningful effect
  • You define a clear primary metric so you are not fishing across many metrics

Common pitfalls

  • Peeking and stopping early when a result looks good inflates false positives
  • A novelty effect can make a new model look better simply because it is new
  • Testing many metrics at once raises the chance of a fluke win

Key idea

An A B test uses random assignment and a significance test so the measured difference reflects the model, not chance.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is random assignment central to an A B test?

2. Why is peeking and stopping early a problem?

3. What is a novelty effect?