Variance and Standard Deviation
The mean tells you the center, but two datasets with the same mean can look very different. Spread measures how far values stray from that center.
Defining the measures
- The variance is the average of the squared distances from the mean.
- The standard deviation is the square root of the variance.
Squaring makes every deviation positive and punishes large gaps more heavily. Taking the square root returns the answer to the original units, which is why standard deviation is easier to interpret than variance.
Population versus sample
When estimating spread from a sample, you divide the squared deviations by n minus 1 instead of n. This Bessel correction counters the fact that the sample mean already fits the data, which otherwise underestimates the true spread.
Why it matters
Standard deviation appears everywhere in machine learning. Standardizing features subtracts the mean and divides by the standard deviation so every feature has comparable scale, which helps gradient based optimizers converge.
Key idea
Variance averages squared deviations from the mean, and standard deviation is its square root in the original units.