← Lessons

quiz vs the machine

Silver1050

Machine Learning

Feature Scaling

Why putting features on a common scale helps many models learn.

3 min read · intro · beat Silver to climb

What it is

Feature scaling transforms numeric inputs so they share a comparable range. Without it, a feature measured in thousands can dominate one measured in fractions purely because of its units.

Two common methods

  • Standardization subtracts the mean and divides by the standard deviation, giving each feature zero mean and unit variance
  • Min max scaling squeezes values into a fixed range such as zero to one

When it matters

Scaling helps algorithms that rely on distances or gradients:

  • Gradient descent converges faster when features are on similar scales
  • K nearest neighbors and k means use distance, which gets distorted by unscaled features
  • Tree based models like decision trees are largely unaffected because they split on thresholds

Avoid leakage

Always fit the scaler on the training data only, then apply those same parameters to the test data. Fitting on everything leaks test information.

Key idea

Scaling features to a common range speeds up gradient based and distance based methods, and you must fit it on training data only.

Check yourself

Answer to earn rating on the learn ladder.

1. Which model type is mostly unaffected by feature scaling?

2. Where should the scaler be fit?