← Lessons

quiz vs the machine

Gold1470

Machine Learning

Statistical Significance in AB Tests

Telling a real effect apart from random noise.

5 min read · core · beat Gold to climb

The question

An AB test splits users between a control and a variant, then compares a metric like conversion. Statistical significance asks whether the observed difference is likely real or just noise from random assignment.

The p value

The p value is the probability of seeing a difference this large, or larger, if the variant truly had no effect. A small p value means such a result would be unlikely under no effect, so you reject the no effect assumption. A common threshold is point zero five.

  • Below the threshold you call the result significant.
  • Above it you cannot conclude an effect exists.

What it does not say

A p value is not the probability the variant works, and significance is not the same as importance. A tiny, useless lift can be significant with enough users, so always report the effect size alongside it.

Common traps

  • Peeking at results repeatedly inflates false positives unless corrected.
  • Underpowered tests with too few users miss real effects.
  • Multiple metrics raise the chance one looks significant by luck.

Set the sample size and stopping rule before you start.

Key idea

Significance uses the p value to judge whether an AB test difference exceeds random noise, but it must be paired with effect size and guarded against peeking and underpowered designs.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a p value represent?

2. Why report effect size alongside significance?