Correlation vs Causation
Correlation measures how two variables move together. Causation means changing one variable actually changes the other. Confusing them is among the most common analytical mistakes.
Measuring correlation
The correlation coefficient ranges from minus one to plus one. Values near plus one mean variables rise together, near minus one mean one rises as the other falls, and near zero mean no linear relationship. Note it captures only linear association and ignores curves.
Why correlation is not causation
- A hidden confounder can drive both variables. Ice cream sales and drownings both rise in summer because heat drives each.
- Reverse causation can flip the assumed direction.
- Pure coincidence can produce strong correlations, especially when scanning many variables.
Establishing causation
The gold standard is a randomized controlled experiment. Randomly assigning a treatment breaks any link to confounders, so a difference in outcomes can be attributed to the treatment itself. When experiments are impossible, careful causal inference methods try to approximate this.
Key idea
Correlation shows variables move together but a confounder, reversed direction, or chance can explain it, so only controlled experiments establish causation.