← Lessons

quiz vs the machine

Gold1450

Machine Learning

The Saddle Points

Flat in some directions and curved in others, saddles stall naive descent.

5 min read · core · beat Gold to climb

What a saddle is

A saddle point has a zero gradient yet is not a minimum. The surface curves up in some directions and down in others, like a horse saddle.

  • The gradient vanishes, so naive descent slows.
  • It is neither a peak nor a valley.

Why they matter

In high dimensional loss surfaces, saddle points are far more common than bad local minima. Many directions can curve down, so being stuck is rarely permanent if there is enough signal to move.

  • Plateaus around saddles slow training.
  • Pure gradient descent can crawl for a long time near them.

Escaping them

The Hessian, the matrix of second derivatives, has both positive and negative eigenvalues at a saddle. Noise from SGD and momentum help push parameters off the flat region into a descending direction.

Recognizing saddles explains why training can stall on a plateau yet later resume progress.

Key idea

A saddle point has a zero gradient but curves up and down in different directions, stalling naive descent until noise or momentum pushes parameters into a descending direction.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the gradient at a saddle point?

2. What distinguishes a saddle from a minimum?

3. What helps escape a saddle point in practice?