The L2 Ridge Regularization
Ridge regression adds a penalty equal to the sum of the squared weights, scaled by lambda. This shrinks all coefficients toward zero smoothly without forcing any to vanish.
The objective
You minimize the loss plus lambda times the sum of squared weights. Larger lambda means stronger shrinkage and a simpler, lower variance model.
How it behaves
- The squared penalty is smooth, so weights shrink gradually rather than snapping to zero.
- Ridge handles correlated features gracefully, spreading weight across them.
- It reduces variance at the cost of a little added bias, often improving generalization.
When to use it
- Choose ridge when you believe most features carry some signal.
- It is the default regularizer for linear and logistic models facing multicollinearity.
- Standardize features first and tune lambda by cross validation.
If you want feature selection instead of smooth shrinkage, prefer lasso or elastic net.
Key idea
Ridge adds a squared weight penalty that smoothly shrinks every coefficient, lowering variance and handling correlated features without setting weights to zero.