The question
An AB test splits users between a control and a variant, then compares a metric like conversion. Statistical significance asks whether the observed difference is likely real or just noise from random assignment.
The p value
The p value is the probability of seeing a difference this large, or larger, if the variant truly had no effect. A small p value means such a result would be unlikely under no effect, so you reject the no effect assumption. A common threshold is point zero five.
- Below the threshold you call the result significant.
- Above it you cannot conclude an effect exists.
What it does not say
A p value is not the probability the variant works, and significance is not the same as importance. A tiny, useless lift can be significant with enough users, so always report the effect size alongside it.
Common traps
- Peeking at results repeatedly inflates false positives unless corrected.
- Underpowered tests with too few users miss real effects.
- Multiple metrics raise the chance one looks significant by luck.
Set the sample size and stopping rule before you start.
Key idea
Significance uses the p value to judge whether an AB test difference exceeds random noise, but it must be paired with effect size and guarded against peeking and underpowered designs.