← Lessons

quiz vs the machine

Gold1430

Machine Learning

The Chi Squared Test

Testing relationships and fit for categorical counts.

4 min read · core · beat Gold to climb

The Chi Squared Test

The chi squared test works with categorical data, comparing the counts you actually observe against the counts you would expect under a hypothesis.

The core idea

For each category the test computes how far the observed count is from the expected count, squares that gap, and divides by the expected count. Summing these terms gives the chi squared statistic. Large values mean observations stray far from expectation, which argues against the null.

Two common uses

  • The goodness of fit test checks whether one categorical variable matches a claimed distribution, like whether a die is fair.
  • The test of independence uses a contingency table to check whether two categorical variables, such as treatment and outcome, are related.

Conditions to respect

  • The data must be counts, not percentages or averages.
  • Expected counts in each cell should be reasonably large, often at least five, or the approximation weakens.

Interpreting it

A large chi squared statistic produces a small p value, signaling that the observed pattern is unlikely if the variables were truly independent or the distribution were as claimed.

Key idea

The chi squared test sums squared gaps between observed and expected counts to test goodness of fit or independence for categorical data.

Check yourself

Answer to earn rating on the learn ladder.

1. What type of data does the chi squared test use?

2. Which chi squared test checks whether two categorical variables are related?