Stopping at the right moment
Train too long and the model overfits, memorizing the training set while validation error climbs. Early stopping watches a validation metric and halts when it stops improving, capturing the model at its best generalization.
The patience knob
Validation curves are noisy, so stopping at the first uptick is too hasty. Patience is the number of epochs you wait without improvement before stopping. A larger patience tolerates noise but wastes compute; a smaller one stops sooner but may quit on a temporary dip.
The decision loop
Getting it right
- Always restore the best checkpoint, not the final one, since the last epochs may be worse.
- Define improvement with a small min delta so tiny noise does not reset the counter.
- Choose the metric that matters, often validation loss or a task accuracy.
Practical notes
- Pair early stopping with checkpointing so the best weights are never lost.
- It implicitly regularizes by limiting effective training time.
Key idea
Early stopping halts when validation stops improving, with patience absorbing noise before quitting. Always restore the best checkpoint rather than the final, possibly overfit, weights.