Statistical Significance in AB Tests

The question

An AB test splits users between a control and a variant, then compares a metric like conversion. Statistical significance asks whether the observed difference is likely real or just noise from random assignment.

The p value

The p value is the probability of seeing a difference this large, or larger, if the variant truly had no effect. A small p value means such a result would be unlikely under no effect, so you reject the no effect assumption. A common threshold is point zero five.

Below the threshold you call the result significant.
Above it you cannot conclude an effect exists.

What it does not say

A p value is not the probability the variant works, and significance is not the same as importance. A tiny, useless lift can be significant with enough users, so always report the effect size alongside it.

Common traps

Peeking at results repeatedly inflates false positives unless corrected.
Underpowered tests with too few users miss real effects.
Multiple metrics raise the chance one looks significant by luck.

Set the sample size and stopping rule before you start.

Key idea