← Lessons

quiz vs the machine

Gold1400

Machine Learning

The Parameter Server Architecture

Centralize weights on servers while workers push gradients and pull updates.

5 min read · core · beat Gold to climb

Servers and workers

The parameter server pattern separates two roles. One or more servers hold the master copy of the model parameters. Many workers compute gradients on data and exchange them with the servers.

  • Workers pull the latest weights before each step.
  • Workers push their computed gradients back.
  • Servers apply updates to the master parameters.

Strengths and weaknesses

This design scales to many workers and tolerates them joining or leaving, which suits heterogeneous clusters. But the servers can become a bottleneck, since every worker talks to them, and a single server is a point of failure unless parameters are sharded.

  • Sharding parameters across servers spreads the load.
  • It naturally supports asynchronous updates.
  • Network to a central point can saturate at scale.

Pull and push

The central servers make coordination simple but place all traffic through a hub.

Key idea

The parameter server centralizes weights on servers that workers pull from and push gradients to, scaling flexibly but risking a central communication bottleneck.

Check yourself

Answer to earn rating on the learn ladder.

1. What do workers do in a parameter server setup?

2. What is a key weakness of the parameter server design?