A B Testing Models Online

What it is

An A B test randomly splits users into groups. The control group sees the current model and the treatment group sees the new model. You then compare a business metric between the groups to decide which model is better.

Why randomization matters

Random assignment makes the groups statistically similar, so any difference in the metric can be attributed to the model rather than to who happened to use it. Without randomization, a confounding factor like time of day could fool you.

Reading the result

You do not just compare raw averages, because noise can produce a difference by chance.

You compute the difference and its statistical significance, often a p value
You decide ahead of time the sample size needed to detect a meaningful effect
You define a clear primary metric so you are not fishing across many metrics

Common pitfalls

Peeking and stopping early when a result looks good inflates false positives
A novelty effect can make a new model look better simply because it is new
Testing many metrics at once raises the chance of a fluke win

Key idea

An A B test uses random assignment and a significance test so the measured difference reflects the model, not chance.

A B Testing Models Online

What it is

Why randomization matters

Reading the result

Common pitfalls

Key idea

Check yourself