← Lessons

quiz vs the machine

Platinum1820

Machine Learning

The A B Testing Statistics

Running controlled online experiments that you can trust.

6 min read · advanced · beat Platinum to climb

The A B Testing Statistics

A B testing is a randomized experiment for products. Users are split between a control and a variant, and a statistical test decides whether the variant truly improves a metric.

The setup

  • Randomly assign users to control or treatment so the groups differ only by the change.
  • Pick a single primary metric decided before the test, such as conversion rate.
  • State the null of no difference and choose alpha and power in advance.

Power and sample size

The power of a test is its chance of detecting a real effect of a given size. Detecting small effects requires large samples. You compute the required sample size before launching, based on the baseline rate, the minimum effect worth catching, alpha, and the desired power.

Common pitfalls

  • Peeking and stopping the moment a result looks significant inflates false positives. Fix the sample size or use sequential methods built for early stopping.
  • Testing many metrics invites false positives, so correct for multiple comparisons.
  • A statistically significant but tiny effect may not justify shipping.
  • Beware novelty effects and segments where the change helps some users but hurts others.

Key idea

A B testing randomizes users between control and variant, sizes the sample for adequate power up front, and avoids peeking and multiple comparison traps before shipping a meaningful effect.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is peeking and stopping early when results look significant dangerous?

2. What does the power of a test measure?

3. Why compute the required sample size before launching the test?