The motivation
Conjugate gradient solves large quadratic minimization without forming or inverting a matrix. It improves on plain gradient descent by choosing smarter search directions.
- Gradient descent can zigzag in narrow valleys.
- Conjugate gradient avoids undoing previous progress.
Conjugate directions
Two directions are conjugate with respect to a matrix if stepping along one does not spoil the minimization already done along the other. The method walks a sequence of such directions.
- Each new direction blends the current gradient with the last direction.
- For an n dimensional quadratic it reaches the minimum in at most n steps in exact arithmetic.
Why it scales
It only needs matrix vector products, never the full matrix inverse. That makes it ideal for very large but structured systems, and it underpins some Hessian free training methods.
Conjugate gradient gives much of Newton's efficiency on quadratics while only using gradient sized operations.
Key idea
Conjugate gradient chooses conjugate search directions so steps never undo each other, solving large quadratic problems with only matrix vector products instead of a full inverse.