The decomposition
For squared error, a model expected test error splits cleanly into three parts.
- Bias squared measures error from wrong assumptions, how far the average prediction sits from the truth.
- Variance measures how much predictions wobble across different training sets.
- Irreducible noise is the randomness in the data that no model can remove.
Reading the parts
- A model that is too simple has high bias and underfits.
- A model that is too flexible has high variance and overfits.
- The noise floor sets the best error any model could reach.
The tradeoff
Lowering one term often raises the other. Adding capacity cuts bias but raises variance. Adding regularization cuts variance but raises bias. The sweet spot minimizes their sum.
Levers in practice
- Regularization and simpler models lower variance.
- More features or capacity lower bias.
- More data mainly lowers variance, letting you afford more capacity.
Key idea
Expected error decomposes into bias squared, variance, and irreducible noise, and good models minimize the sum of the first two.