Convex versus nonconvex
A function is convex if any line segment between two points on its graph stays above the graph. A convex loss looks like a single bowl.
- Convex: every local minimum is the global minimum.
- Gradient descent on a convex loss reaches the best solution.
Local minima
A local minimum is a point lower than its neighbors but not necessarily the lowest overall. Nonconvex losses, like those of deep networks, can have many.
- Descent may settle into a local minimum.
- Different starting points can reach different minima.
Practical view
In deep learning the surface is highly nonconvex, yet training often works. Many local minima reach similar low loss, and the bigger obstacles are often saddle points rather than bad minima.
Knowing whether a loss is convex tells you whether you are guaranteed the best answer or merely a good one.
Key idea
Convex losses have a single global minimum that descent always finds, while nonconvex losses hide many local minima, though in deep nets many of them are good enough.