Servers and workers
The parameter server pattern separates two roles. One or more servers hold the master copy of the model parameters. Many workers compute gradients on data and exchange them with the servers.
- Workers pull the latest weights before each step.
- Workers push their computed gradients back.
- Servers apply updates to the master parameters.
Strengths and weaknesses
This design scales to many workers and tolerates them joining or leaving, which suits heterogeneous clusters. But the servers can become a bottleneck, since every worker talks to them, and a single server is a point of failure unless parameters are sharded.
- Sharding parameters across servers spreads the load.
- It naturally supports asynchronous updates.
- Network to a central point can saturate at scale.
Pull and push
The central servers make coordination simple but place all traffic through a hub.
Key idea
The parameter server centralizes weights on servers that workers pull from and push gradients to, scaling flexibly but risking a central communication bottleneck.