The Train Validation Test Split
Reliable evaluation needs three distinct data roles: train, validation, and test. Mixing these roles is the fastest way to fool yourself with optimistic numbers.
The three roles
- Training set fits the model parameters such as weights.
- Validation set guides decisions like hyperparameters, model choice, and early stopping.
- Test set is touched only once at the end to report an unbiased estimate.
Why three not two
If you tune on the test set, you implicitly fit to it and your reported score becomes optimistic. The validation set absorbs all the tuning decisions, protecting the test set from contamination.
Good practice
- Split before any preprocessing that learns from data, to avoid leakage.
- Keep the test set locked away until the very end.
- Use proportions like 70 20 10 or 80 10 10, adjusting for dataset size.
Key idea
Train fits parameters, validation guides tuning, and the test set is touched once for an unbiased report, so keeping these roles separate prevents optimistic, contaminated results.