The Bias Variance Decomposition

The decomposition

For squared error, a model expected test error splits cleanly into three parts.

Bias squared measures error from wrong assumptions, how far the average prediction sits from the truth.
Variance measures how much predictions wobble across different training sets.
Irreducible noise is the randomness in the data that no model can remove.

Reading the parts

A model that is too simple has high bias and underfits.
A model that is too flexible has high variance and overfits.
The noise floor sets the best error any model could reach.

The tradeoff

Lowering one term often raises the other. Adding capacity cuts bias but raises variance. Adding regularization cuts variance but raises bias. The sweet spot minimizes their sum.

Levers in practice

Regularization and simpler models lower variance.
More features or capacity lower bias.
More data mainly lowers variance, letting you afford more capacity.

Key idea

Expected error decomposes into bias squared, variance, and irreducible noise, and good models minimize the sum of the first two.

The Bias Variance Decomposition

The decomposition

Reading the parts

The tradeoff

Levers in practice

Key idea

Check yourself