← Lessons

quiz vs the machine

Platinum1760

Machine Learning

Gradient Descent Intuition

Following the slope downhill to lower loss.

5 min read · advanced · beat Platinum to climb

Gradient Descent Intuition

Gradient descent is the workhorse optimization method behind most machine learning. Its idea is wonderfully simple: to minimize a loss, repeatedly step in the direction that decreases it fastest.

The gradient is a vector of partial derivatives. It points in the direction of steepest increase of the loss. To go down, we step in the opposite direction, scaled by the learning rate. Repeat this and the loss generally falls toward a minimum.

There are flavors that trade accuracy for speed:

  • Batch gradient descent uses the whole dataset for each step, giving a precise but expensive gradient
  • Stochastic gradient descent uses one example at a time, noisy but fast
  • Mini batch gradient descent uses a small group, the common compromise

The noise in stochastic and mini batch methods is not just a flaw. It can help the optimizer escape shallow traps and saddle points that a perfectly smooth descent might linger in. This is part of why mini batch training is so effective for large neural networks.

Convergence depends on the learning rate and the shape of the landscape. With a sensible rate, gradient descent reliably finds low loss regions even in spaces with millions of dimensions, which is remarkable given how little it computes at each step.

Key idea

Gradient descent minimizes loss by stepping opposite the gradient, and mini batch noise helps it escape shallow traps on the way down.

Check yourself

Answer to earn rating on the learn ladder.

1. Which direction does the gradient point?

2. What does mini batch noise help with?

3. Batch gradient descent uses how much data per step?