Filling the bubble
Plain model parallelism leaves devices idle while they wait their turn. Pipeline parallelism fixes this by splitting the batch into smaller micro batches and streaming them through the stages like an assembly line.
- The model is cut into sequential stages, one per device.
- Micro batches enter one after another so stages overlap work.
- While stage two processes micro batch one, stage one starts micro batch two.
The pipeline bubble
At the start and end of each step some stages have nothing to do. This idle region is the bubble. More micro batches per step shrink the bubble fraction, raising utilization toward full.
- Bubble overhead falls as micro batch count rises.
- Too many micro batches can hurt the batch statistics.
- Schedules like one forward one backward reduce peak memory.
Stage flow
By keeping every stage fed with micro batches, the pipeline approaches the throughput of having no idle stages.
Key idea
Pipeline parallelism stages the model across devices and streams micro batches through them, shrinking the idle bubble so model parallel hardware stays busy.