The Learning Curve Diagnosis
A learning curve plots training and validation error as a function of the number of training examples. Its shape tells you whether gathering more data or changing the model is the right next move.
How to read it
- High bias shows both curves converging to a high error. The lines meet but at a poor level. More data will not help much.
- High variance shows a large gap between low training error and higher validation error. The gap shrinks as data grows, so more data does help.
Acting on the curve
- For high bias, add capacity or features rather than data.
- For high variance, add data, regularize, or simplify the model.
- A healthy curve shows both errors converging to a low value with a small gap.
Learning curves prevent wasted effort. Collecting more data is expensive, so confirm it will actually help before investing.
Key idea
A learning curve plots error against training set size, where converging high error signals bias needing capacity and a large gap signals variance that more data can close.