The Ridge And Lasso Recap

Two penalties that shrink coefficients, one toward small values and one toward exact zero.

Why penalize coefficients

When features are many or correlated, least squares can produce huge, unstable weights. Regularization adds a penalty on coefficient size to the loss, trading a little bias for much lower variance.

Ridge versus lasso

Ridge adds the sum of squared weights, an L2 penalty. It shrinks all coefficients smoothly toward zero but rarely sets any to exactly zero.
Lasso adds the sum of absolute weights, an L1 penalty. Its corner shaped constraint pushes some coefficients to exactly zero, performing feature selection.

Choosing between them

Use ridge when you believe most features matter a little and want stability.
Use lasso when you expect only a few features truly matter and want a sparse model.
Elastic net blends both, keeping sparsity while handling correlated groups.

The penalty strength is a hyperparameter tuned by cross validation. Always standardize features first so the penalty treats them fairly.

Key idea

Ridge shrinks all coefficients smoothly with an L2 penalty for stability, while lasso uses an L1 penalty to drive some coefficients to exactly zero for feature selection. Elastic net combines both.

The Ridge And Lasso Recap

Why penalize coefficients

Ridge versus lasso

Choosing between them

Key idea

Check yourself