The L1 Lasso Regularization
Lasso regression adds a penalty equal to the sum of the absolute values of the weights, scaled by a strength lambda. This discourages large coefficients and, importantly, can set some coefficients exactly to zero.
The objective
You minimize the usual loss plus lambda times the sum of absolute weights. As lambda grows, the penalty pushes weights toward zero more aggressively.
Why zeros appear
- The absolute value penalty has a sharp corner at zero.
- Optimization tends to land exactly on that corner for weak features.
- The result is automatic feature selection, producing a sparse model.
Practical notes
- Lasso is useful when you suspect many features are irrelevant.
- It can struggle when features are highly correlated, picking one and zeroing the rest somewhat arbitrarily.
- Always standardize features first so the penalty treats them fairly.
- Tune lambda with cross validation rather than guessing.
Key idea
Lasso adds an absolute value penalty whose sharp corner drives weak coefficients to exactly zero, giving sparse, automatically feature selected models.