Train Validation Test Split Revisited

Why split at all

A model that scores well on the rows it learned from tells you nothing. We split data so we can measure how the model behaves on examples it has never seen.

The three roles

The training set fits the model parameters.
The validation set tunes choices you make as a human, such as learning rate, depth, or which features to keep.
The test set is touched once, at the very end, to estimate true future performance.

The leak that ruins everything

If you peek at the test set while tuning, its numbers stop being honest. You start fitting the test set indirectly, and the reported score becomes optimistic. Treat the test set like a sealed envelope.

Practical sizes

A common starting point is roughly seventy percent train, fifteen percent validation, fifteen percent test.
With millions of rows, even one percent can be enough for validation and test.
Always split before scaling or encoding, so statistics from test rows never leak into training.

Key idea

Train to fit, validate to tune, and test exactly once to get an honest estimate of how the model will perform on new data.

Train Validation Test Split Revisited

Why split at all

The three roles

The leak that ruins everything

Practical sizes

Key idea

Check yourself