Feature Scaling

What it is

Feature scaling transforms numeric inputs so they share a comparable range. Without it, a feature measured in thousands can dominate one measured in fractions purely because of its units.

Two common methods

Standardization subtracts the mean and divides by the standard deviation, giving each feature zero mean and unit variance
Min max scaling squeezes values into a fixed range such as zero to one

When it matters

Scaling helps algorithms that rely on distances or gradients:

Gradient descent converges faster when features are on similar scales
K nearest neighbors and k means use distance, which gets distorted by unscaled features
Tree based models like decision trees are largely unaffected because they split on thresholds

Avoid leakage

Always fit the scaler on the training data only, then apply those same parameters to the test data. Fitting on everything leaks test information.

Key idea

Scaling features to a common range speeds up gradient based and distance based methods, and you must fit it on training data only.

What it is

Two common methods

When it matters

Avoid leakage

Key idea

Check yourself