Convex versus Non Convex Optimization
The shape of the loss landscape decides how easy training is. A convex surface looks like a single bowl, while a non convex surface has many hills and valleys.
Convex problems
- Any local minimum is also the global minimum.
- Gradient descent is guaranteed to reach the best answer.
- Linear regression and logistic regression both have convex losses.
Non convex problems
Deep neural networks are non convex because stacking nonlinear layers folds the landscape into many basins. There is no guarantee that the valley you land in is the deepest one. In practice this is less alarming than it sounds, since many minima in large networks reach similarly low loss and generalize well.
Why we tolerate it
We accept non convexity because the expressive power of deep networks is worth losing the global guarantee. Good initialization, momentum, and stochastic noise help the optimizer escape shallow traps and saddle points. The goal shifts from finding the single best point to finding a good enough basin reliably.
Key idea
Convex losses guarantee a global optimum, while deep networks settle for a good enough basin in a non convex landscape.