What it is
Feature scaling transforms numeric inputs so they share a comparable range. Without it, a feature measured in thousands can dominate one measured in fractions purely because of its units.
Two common methods
- Standardization subtracts the mean and divides by the standard deviation, giving each feature zero mean and unit variance
- Min max scaling squeezes values into a fixed range such as zero to one
When it matters
Scaling helps algorithms that rely on distances or gradients:
- Gradient descent converges faster when features are on similar scales
- K nearest neighbors and k means use distance, which gets distorted by unscaled features
- Tree based models like decision trees are largely unaffected because they split on thresholds
Avoid leakage
Always fit the scaler on the training data only, then apply those same parameters to the test data. Fitting on everything leaks test information.
Key idea
Scaling features to a common range speeds up gradient based and distance based methods, and you must fit it on training data only.