← Lessons

quiz vs the machine

Gold1400

Machine Learning

The Gradient Descent For Regression

Walk downhill on the error surface when a closed form is too costly.

5 min read · core · beat Gold to climb

Why iterate

The closed form for regression inverts a large matrix, which is slow or infeasible with millions of features or examples. Gradient descent instead nudges the weights step by step toward lower error.

The update rule

  • Compute the gradient, the direction of steepest increase of the loss.
  • Move the weights a small step in the opposite direction.
  • Repeat until the loss stops dropping.

The step size is the learning rate. Too large and the updates overshoot and diverge; too small and training crawls.

Batch and stochastic flavors

  • Batch gradient descent uses all data per step, smooth but expensive.
  • Stochastic uses one example per step, noisy but fast.
  • Mini batch uses a small group, the common compromise.

Convergence tips

  • Scale features so the error surface is round, not a stretched valley.
  • Decay the learning rate over time to settle near the minimum.
  • For convex least squares it converges to the global optimum.

Key idea

Gradient descent fits regression by repeatedly stepping downhill on the loss. The learning rate and feature scaling decide whether it converges smoothly or diverges.

Check yourself

Answer to earn rating on the learn ladder.

1. What controls how far each gradient descent step moves?

2. Why prefer gradient descent over the closed form sometimes?

3. What does mini batch gradient descent trade off?