← Lessons

quiz vs the machine

Silver1080

Machine Learning

Gradient Descent

How models learn by stepping downhill on the loss surface.

4 min read · intro · beat Silver to climb

What it is

Gradient descent is the optimization algorithm that powers most machine learning training. The goal is to find model parameters that minimize a loss function, a number that measures how wrong the predictions are.

The intuition

Imagine standing on a foggy hillside trying to reach the valley. You cannot see far, but you can feel which way is downhill. You take a small step in that direction, then repeat. The gradient is the mathematical version of "which way is downhill" for the loss.

The update rule

At each step the algorithm:

  • Computes the gradient of the loss with respect to each parameter
  • Moves each parameter a little in the opposite direction
  • The size of the step is controlled by the learning rate

A learning rate too large overshoots the valley and may diverge. Too small and training crawls. Variants like stochastic gradient descent use small random batches of data so each step is fast and noisy but cheap.

Key idea

Gradient descent repeatedly nudges parameters in the direction that most reduces loss, with step size set by the learning rate.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the gradient tell you?

2. What happens if the learning rate is far too large?