← Lessons

quiz vs the machine

Platinum1800

Machine Learning

The Conjugate Gradient

Pick non interfering directions to solve big quadratic problems efficiently.

6 min read · advanced · beat Platinum to climb

The motivation

Conjugate gradient solves large quadratic minimization without forming or inverting a matrix. It improves on plain gradient descent by choosing smarter search directions.

  • Gradient descent can zigzag in narrow valleys.
  • Conjugate gradient avoids undoing previous progress.

Conjugate directions

Two directions are conjugate with respect to a matrix if stepping along one does not spoil the minimization already done along the other. The method walks a sequence of such directions.

  • Each new direction blends the current gradient with the last direction.
  • For an n dimensional quadratic it reaches the minimum in at most n steps in exact arithmetic.

Why it scales

It only needs matrix vector products, never the full matrix inverse. That makes it ideal for very large but structured systems, and it underpins some Hessian free training methods.

Conjugate gradient gives much of Newton's efficiency on quadratics while only using gradient sized operations.

Key idea

Conjugate gradient chooses conjugate search directions so steps never undo each other, solving large quadratic problems with only matrix vector products instead of a full inverse.

Check yourself

Answer to earn rating on the learn ladder.

1. What does it mean for two directions to be conjugate?

2. What expensive operation does conjugate gradient avoid?