The Elastic Net
The elastic net combines the L1 penalty of lasso with the L2 penalty of ridge. It enjoys lasso style feature selection while keeping ridge style stability across correlated features.
The objective
You minimize the loss plus a mix of two penalties:
- An L1 term that encourages sparsity by zeroing weak weights.
- An L2 term that shrinks weights smoothly and stabilizes the solution.
- A mixing parameter controls how much weight each penalty gets.
Why blend them
- Pure lasso struggles with groups of correlated features, picking one arbitrarily.
- Elastic net tends to keep or drop correlated features together, which is often more sensible.
- It retains automatic feature selection while reducing the instability of lasso.
Tuning
- Tune both the overall lambda and the L1 to L2 mix with cross validation.
- Standardize features so both penalties act fairly.
- When the mix favors L1 it behaves like lasso; favoring L2 makes it behave like ridge.
Key idea
Elastic net mixes L1 and L2 penalties so you get lasso style feature selection plus ridge style stability, handling correlated feature groups better than either alone.