The Chi Squared Test
The chi squared test works with categorical data, comparing the counts you actually observe against the counts you would expect under a hypothesis.
The core idea
For each category the test computes how far the observed count is from the expected count, squares that gap, and divides by the expected count. Summing these terms gives the chi squared statistic. Large values mean observations stray far from expectation, which argues against the null.
Two common uses
- The goodness of fit test checks whether one categorical variable matches a claimed distribution, like whether a die is fair.
- The test of independence uses a contingency table to check whether two categorical variables, such as treatment and outcome, are related.
Conditions to respect
- The data must be counts, not percentages or averages.
- Expected counts in each cell should be reasonably large, often at least five, or the approximation weakens.
Interpreting it
A large chi squared statistic produces a small p value, signaling that the observed pattern is unlikely if the variables were truly independent or the distribution were as claimed.
Key idea
The chi squared test sums squared gaps between observed and expected counts to test goodness of fit or independence for categorical data.