← Lessons

quiz vs the machine

Platinum1730

Machine Learning

The Parameter Server Pattern

A central store of weights that workers push gradients to and pull updates from.

5 min read · advanced · beat Platinum to climb

The architecture

The parameter server pattern splits roles. One set of machines, the servers, hold the authoritative copy of the model weights. Another set, the workers, compute gradients on data shards.

  • A worker pulls the current weights from the servers.
  • It computes gradients on its batch.
  • It pushes those gradients back, and the servers apply the update.

Sharding the weights across several servers spreads the network load so no single server is overwhelmed.

Synchronous or asynchronous

  • Synchronous: servers wait for all workers before updating. This matches single machine training but a slow worker, a straggler, holds everyone up.
  • Asynchronous: each worker updates as soon as it finishes. There are no stragglers, but a worker may compute on slightly stale weights, adding noise to training.

Where it fits

Parameter servers were the dominant pattern before fast collective libraries. Today, all reduce often wins for dense models because it avoids a server tier, but the pattern still suits sparse workloads where each worker touches only a few parameters.

Key idea

A parameter server holds the master weights while workers pull weights and push gradients; the choice between synchronous and asynchronous updates trades straggler delay against stale gradient noise.

Check yourself

Answer to earn rating on the learn ladder.

1. In the parameter server pattern, who holds the authoritative weights?

2. What is the main risk of asynchronous parameter server updates?