← Lessons

quiz vs the machine

Silver1130

Machine Learning

The Ridge And Lasso Recap

Two penalties that shrink coefficients, one toward small values and one toward exact zero.

4 min read · intro · beat Silver to climb

Why penalize coefficients

When features are many or correlated, least squares can produce huge, unstable weights. Regularization adds a penalty on coefficient size to the loss, trading a little bias for much lower variance.

Ridge versus lasso

  • Ridge adds the sum of squared weights, an L2 penalty. It shrinks all coefficients smoothly toward zero but rarely sets any to exactly zero.
  • Lasso adds the sum of absolute weights, an L1 penalty. Its corner shaped constraint pushes some coefficients to exactly zero, performing feature selection.

Choosing between them

  • Use ridge when you believe most features matter a little and want stability.
  • Use lasso when you expect only a few features truly matter and want a sparse model.
  • Elastic net blends both, keeping sparsity while handling correlated groups.

The penalty strength is a hyperparameter tuned by cross validation. Always standardize features first so the penalty treats them fairly.

Key idea

Ridge shrinks all coefficients smoothly with an L2 penalty for stability, while lasso uses an L1 penalty to drive some coefficients to exactly zero for feature selection. Elastic net combines both.

Check yourself

Answer to earn rating on the learn ladder.

1. Which regularizer can drive coefficients to exactly zero?

2. When is ridge usually preferred over lasso?