The A B Testing Statistics

A B testing is a randomized experiment for products. Users are split between a control and a variant, and a statistical test decides whether the variant truly improves a metric.

The setup

Randomly assign users to control or treatment so the groups differ only by the change.
Pick a single primary metric decided before the test, such as conversion rate.
State the null of no difference and choose alpha and power in advance.

Power and sample size

The power of a test is its chance of detecting a real effect of a given size. Detecting small effects requires large samples. You compute the required sample size before launching, based on the baseline rate, the minimum effect worth catching, alpha, and the desired power.

Common pitfalls

Peeking and stopping the moment a result looks significant inflates false positives. Fix the sample size or use sequential methods built for early stopping.
Testing many metrics invites false positives, so correct for multiple comparisons.
A statistically significant but tiny effect may not justify shipping.
Beware novelty effects and segments where the change helps some users but hurts others.

Key idea

A B testing randomizes users between control and variant, sizes the sample for adequate power up front, and avoids peeking and multiple comparison traps before shipping a meaningful effect.

The A B Testing Statistics