The Loss Landscape
It helps to imagine training as moving across a loss landscape. Each setting of the model's parameters is a point on a surface, and the height at that point is the loss. Training tries to walk downhill toward lower error.
In this picture:
- A valley is a region of low loss where the model performs well
- A global minimum is the lowest point on the whole surface
- A local minimum is a valley that is low nearby but not the lowest overall
Real landscapes for deep models are vast and high dimensional, far beyond what we can draw. They contain ridges, plateaus, and saddle points where the surface curves up in some directions and down in others.
The optimizer cannot see the whole map. It only feels the slope, the gradient, at its current location and steps downhill. This is why the starting point and the step size influence where training ends up.
A reassuring fact for large neural networks is that most local minima tend to give similar, good loss values. So getting stuck in a poor valley is less common than early intuition suggests. Plateaus and saddle points slow progress more often than bad minima do.
Key idea
The loss landscape is a surface where height is error; training descends the slope toward low loss valleys it cannot fully see.