The problem it solves
Plain regression can chase noise and produce huge unstable weights, especially with many or correlated features. Regularization adds a penalty on the weights so the model prefers simpler fits.
Two classic penalties
- Ridge adds the sum of squared weights. It shrinks all weights smoothly toward zero and stabilizes collinear features.
- Lasso adds the sum of absolute weights. It can drive some weights exactly to zero, performing feature selection.
- Elastic net mixes both to get shrinkage plus selection.
The strength knob
A coefficient often called lambda sets the penalty strength.
- Small lambda behaves like plain regression and may overfit.
- Large lambda shrinks weights hard and may underfit.
- Tune it by cross validation, not by guessing.
Why it helps
- Reduces variance at the cost of a little bias.
- Produces more stable, generalizable weights.
- Always scale features first so the penalty treats them fairly.
Key idea
Regularized regression adds a weight penalty to the loss. Ridge shrinks smoothly, lasso selects features, and the penalty strength trades bias for lower variance.