← Lessons

quiz vs the machine

Gold1360

Machine Learning

The L2 Ridge Regularization

Add a squared penalty that shrinks weights smoothly to reduce variance.

4 min read · core · beat Gold to climb

The L2 Ridge Regularization

Ridge regression adds a penalty equal to the sum of the squared weights, scaled by lambda. This shrinks all coefficients toward zero smoothly without forcing any to vanish.

The objective

You minimize the loss plus lambda times the sum of squared weights. Larger lambda means stronger shrinkage and a simpler, lower variance model.

How it behaves

  • The squared penalty is smooth, so weights shrink gradually rather than snapping to zero.
  • Ridge handles correlated features gracefully, spreading weight across them.
  • It reduces variance at the cost of a little added bias, often improving generalization.

When to use it

  • Choose ridge when you believe most features carry some signal.
  • It is the default regularizer for linear and logistic models facing multicollinearity.
  • Standardize features first and tune lambda by cross validation.

If you want feature selection instead of smooth shrinkage, prefer lasso or elastic net.

Key idea

Ridge adds a squared weight penalty that smoothly shrinks every coefficient, lowering variance and handling correlated features without setting weights to zero.

Check yourself

Answer to earn rating on the learn ladder.

1. How does ridge differ from lasso?

2. What is a strength of ridge with correlated features?