The Parameter Server Architecture

Servers and workers

The parameter server pattern separates two roles. One or more servers hold the master copy of the model parameters. Many workers compute gradients on data and exchange them with the servers.

Workers pull the latest weights before each step.
Workers push their computed gradients back.
Servers apply updates to the master parameters.

Strengths and weaknesses

This design scales to many workers and tolerates them joining or leaving, which suits heterogeneous clusters. But the servers can become a bottleneck, since every worker talks to them, and a single server is a point of failure unless parameters are sharded.

Sharding parameters across servers spreads the load.
It naturally supports asynchronous updates.
Network to a central point can saturate at scale.

Pull and push

The central servers make coordination simple but place all traffic through a hub.

Key idea

The parameter server centralizes weights on servers that workers pull from and push gradients to, scaling flexibly but risking a central communication bottleneck.

The Parameter Server Architecture

Servers and workers

Strengths and weaknesses

Pull and push

Key idea

Check yourself