Overfitting And Underfitting
Every model lives somewhere between two failure modes. Underfitting means it has not learned the pattern, while overfitting means it has learned the noise. The gap between training and validation scores is your main clue.
Underfitting
- The model performs poorly on both training and validation data.
- Causes include too few features, too little capacity, or excessive regularization.
- Fixes: add features, use a richer model, or reduce regularization.
Overfitting
- The model performs well on training data but poorly on validation data.
- It has memorized quirks and noise that do not generalize.
- Fixes: gather more data, simplify the model, add regularization, or use early stopping.
Reading the gap
A small gap with high error signals underfitting. A large gap signals overfitting. The goal is a model that generalizes, where validation error is both low and close to training error.
Key idea
Underfitting shows poor scores everywhere while overfitting shows a large train to validation gap, and the cure depends on which failure mode you diagnose.