Descriptive Statistics Mean Median Mode
Before modeling anything, you need to summarize your data. The three classic measures of central tendency describe where the center of a distribution sits.
The three measures
- The mean is the arithmetic average: add every value and divide by the count.
- The median is the middle value when the data is sorted, splitting it into two equal halves.
- The mode is the value that appears most often.
When each one helps
The mean uses every value, so it is precise but sensitive to outliers. A single huge salary drags the mean income upward.
The median ignores magnitudes and only cares about rank, so it is robust to outliers. This is why incomes and house prices are usually reported as medians.
The mode is the only summary that works for categorical data, like the most common eye color, where averaging makes no sense.
A worked feel
For the values 2, 3, 3, 8, the mean is 4, the median is 3, and the mode is 3. The lone 8 pulls the mean above the rest of the cluster.
Key idea
Mean, median, and mode each describe a dataset center, and the median resists outliers that distort the mean.