← Lessons

quiz vs the machine

Gold1390

Machine Learning

Normalization and Standardization

Putting features on a common, comparable scale.

5 min read · core · beat Gold to climb

Normalization and Standardization

Raw features often live on wildly different scales. Income might range in the thousands while age ranges in the tens. Many algorithms struggle when scales differ, so we rescale features first.

Two common transforms do this:

  • Normalization, often min max scaling, squeezes each feature into a fixed range such as zero to one
  • Standardization shifts each feature to have a mean of zero and a standard deviation of one

Standardization is computed by subtracting the mean and dividing by the standard deviation of the feature. Normalization is computed by subtracting the minimum and dividing by the range.

Why bother? Distance based methods and gradient descent both treat large numbers as more important by accident. A feature with a big raw scale can dominate the loss and warp the optimization, making training slow or unstable. Putting features on a comparable scale lets each contribute fairly and helps the optimizer converge.

One crucial rule: compute the scaling statistics on the training set only, then apply the same transform to validation and test data. Fitting the scaler on all data leaks information and inflates your scores.

Key idea

Rescaling features through normalization or standardization puts them on a common scale; fit the statistics on training data only to avoid leakage.

Check yourself

Answer to earn rating on the learn ladder.

1. What does standardization produce?

2. Where should scaling statistics be computed?

3. Why does rescaling help gradient descent?