← Lessons

quiz vs the machine

Gold1350

Machine Learning

The Mixed Precision Training

Use lower precision math for speed while guarding numerical stability.

5 min read · core · beat Gold to climb

Half the bits, more speed

Mixed precision runs most operations in a lower precision format such as sixteen bit floats while keeping a few sensitive parts in higher precision. Lower precision halves memory traffic and runs faster on modern accelerators.

  • Forward and backward math runs in low precision.
  • A master copy of the weights is kept in higher precision.
  • Some reductions stay high precision for accuracy.

The stability tricks

Low precision has a narrow range, so small gradients can underflow to zero. Loss scaling multiplies the loss by a large factor before the backward pass, lifting tiny gradients into the representable range, then divides it back out before the update.

  • Loss scaling prevents gradient underflow.
  • The high precision master weights avoid update drift.
  • Dynamic scaling adjusts the factor automatically.

The precision path

The combination gives most of the speed of low precision with the stability of high precision.

Key idea

Mixed precision runs math in low precision for speed while keeping high precision master weights and loss scaling to preserve numerical stability.

Check yourself

Answer to earn rating on the learn ladder.

1. What does loss scaling prevent?

2. Why keep a high precision master copy of the weights?