← Lessons

quiz vs the machine

Silver1060

Machine Learning

Train Validation Test Split Revisited

Why three separate data slices keep your performance estimate honest.

4 min read · intro · beat Silver to climb

Why split at all

A model that scores well on the rows it learned from tells you nothing. We split data so we can measure how the model behaves on examples it has never seen.

The three roles

  • The training set fits the model parameters.
  • The validation set tunes choices you make as a human, such as learning rate, depth, or which features to keep.
  • The test set is touched once, at the very end, to estimate true future performance.

The leak that ruins everything

If you peek at the test set while tuning, its numbers stop being honest. You start fitting the test set indirectly, and the reported score becomes optimistic. Treat the test set like a sealed envelope.

Practical sizes

  • A common starting point is roughly seventy percent train, fifteen percent validation, fifteen percent test.
  • With millions of rows, even one percent can be enough for validation and test.
  • Always split before scaling or encoding, so statistics from test rows never leak into training.

Key idea

Train to fit, validate to tune, and test exactly once to get an honest estimate of how the model will perform on new data.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the validation set used for?

2. Why touch the test set only once?