← Lessons

quiz vs the machine

Gold1360

Machine Learning

Correlation vs Causation

Why a strong relationship does not prove one thing causes another.

4 min read · core · beat Gold to climb

Correlation vs Causation

Correlation measures how two variables move together. Causation means changing one variable actually changes the other. Confusing them is among the most common analytical mistakes.

Measuring correlation

The correlation coefficient ranges from minus one to plus one. Values near plus one mean variables rise together, near minus one mean one rises as the other falls, and near zero mean no linear relationship. Note it captures only linear association and ignores curves.

Why correlation is not causation

  • A hidden confounder can drive both variables. Ice cream sales and drownings both rise in summer because heat drives each.
  • Reverse causation can flip the assumed direction.
  • Pure coincidence can produce strong correlations, especially when scanning many variables.

Establishing causation

The gold standard is a randomized controlled experiment. Randomly assigning a treatment breaks any link to confounders, so a difference in outcomes can be attributed to the treatment itself. When experiments are impossible, careful causal inference methods try to approximate this.

Key idea

Correlation shows variables move together but a confounder, reversed direction, or chance can explain it, so only controlled experiments establish causation.

Check yourself

Answer to earn rating on the learn ladder.

1. Why can a strong correlation fail to prove causation?

2. What is the gold standard for establishing causation?

3. What does a correlation coefficient near zero indicate?