← Lessons

quiz vs the machine

Gold1380

Machine Learning

The Synchronous SGD

Make every worker step in lockstep for clean, reproducible updates.

4 min read · core · beat Gold to climb

Stepping together

Synchronous SGD keeps all workers in lockstep. Every worker computes a gradient on its data shard, then they wait at a barrier and combine gradients before any of them updates.

  • All workers use the same weight version each step.
  • A collective such as all reduce averages the gradients.
  • Every worker applies the identical update.

Clean but barrier bound

Because updates use the same weights, synchronous SGD behaves like one large batch and is easier to reason about and reproduce. The price is the straggler problem, since the slowest worker sets the pace for the whole step.

  • It avoids the staleness of asynchronous training.
  • Performance is limited by the slowest worker.
  • Backup workers can mitigate stragglers.

A shared barrier

The barrier guarantees consistency at the cost of waiting for the slowest participant.

Key idea

Synchronous SGD averages gradients at a barrier so all workers apply the same update, giving clean reproducible training but exposing the straggler problem.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the barrier in synchronous SGD guarantee?

2. What is the main cost of synchronous SGD?