← Lessons

quiz vs the machine

Gold1410

Machine Learning

L1 versus L2 Regularization Effects

Two penalties that pull weights toward zero in different ways.

4 min read · core · beat Gold to climb

L1 versus L2 Regularization Effects

Regularization adds a penalty on weight size to the loss, discouraging overly complex models. L1 and L2 penalties shrink weights but produce very different solutions.

The two penalties

  • L2, also called weight decay, adds the sum of squared weights.
  • L1 adds the sum of absolute weights.
  • Both trade a little training fit for better generalization.

Different geometry

L2 pushes every weight smoothly toward zero but rarely makes any exactly zero, so it spreads importance across many small weights. L1 has a constant pull regardless of weight size, which drives many weights exactly to zero. That makes L1 a feature selector that produces sparse models you can inspect.

When to use which

Reach for L2 when you want smooth, stable shrinkage and believe most features matter a little. Reach for L1 when you suspect many features are useless and want the model to ignore them outright. The elastic net blends both to get sparsity with stability.

Key idea

L2 shrinks weights smoothly toward zero while L1 drives many to exactly zero, giving sparse, selectable models.

Check yourself

Answer to earn rating on the learn ladder.

1. Which penalty tends to drive weights to exactly zero?

2. L2 regularization is also known as

3. Why might you prefer L1 for a high dimensional problem?