← Lessons

quiz vs the machine

Gold1380

Machine Learning

Convex versus Non Convex Optimization

Why neural nets lack a single guaranteed best answer.

4 min read · core · beat Gold to climb

Convex versus Non Convex Optimization

The shape of the loss landscape decides how easy training is. A convex surface looks like a single bowl, while a non convex surface has many hills and valleys.

Convex problems

  • Any local minimum is also the global minimum.
  • Gradient descent is guaranteed to reach the best answer.
  • Linear regression and logistic regression both have convex losses.

Non convex problems

Deep neural networks are non convex because stacking nonlinear layers folds the landscape into many basins. There is no guarantee that the valley you land in is the deepest one. In practice this is less alarming than it sounds, since many minima in large networks reach similarly low loss and generalize well.

Why we tolerate it

We accept non convexity because the expressive power of deep networks is worth losing the global guarantee. Good initialization, momentum, and stochastic noise help the optimizer escape shallow traps and saddle points. The goal shifts from finding the single best point to finding a good enough basin reliably.

Key idea

Convex losses guarantee a global optimum, while deep networks settle for a good enough basin in a non convex landscape.

Check yourself

Answer to earn rating on the learn ladder.

1. What is true of any local minimum in a convex problem?

2. Why are deep neural networks non convex?

3. How do practitioners cope with non convexity?