The L2 Ridge Regularization

Ridge regression adds a penalty equal to the sum of the squared weights, scaled by lambda. This shrinks all coefficients toward zero smoothly without forcing any to vanish.

The objective

You minimize the loss plus lambda times the sum of squared weights. Larger lambda means stronger shrinkage and a simpler, lower variance model.

How it behaves

The squared penalty is smooth, so weights shrink gradually rather than snapping to zero.
Ridge handles correlated features gracefully, spreading weight across them.
It reduces variance at the cost of a little added bias, often improving generalization.

When to use it

Choose ridge when you believe most features carry some signal.
It is the default regularizer for linear and logistic models facing multicollinearity.
Standardize features first and tune lambda by cross validation.

If you want feature selection instead of smooth shrinkage, prefer lasso or elastic net.

Key idea

Ridge adds a squared weight penalty that smoothly shrinks every coefficient, lowering variance and handling correlated features without setting weights to zero.

The L2 Ridge Regularization