← Lessons

quiz vs the machine

Gold1380

Machine Learning

The Pipeline Parallelism

Keep model parallel devices busy by streaming micro batches through stages.

5 min read · core · beat Gold to climb

Filling the bubble

Plain model parallelism leaves devices idle while they wait their turn. Pipeline parallelism fixes this by splitting the batch into smaller micro batches and streaming them through the stages like an assembly line.

  • The model is cut into sequential stages, one per device.
  • Micro batches enter one after another so stages overlap work.
  • While stage two processes micro batch one, stage one starts micro batch two.

The pipeline bubble

At the start and end of each step some stages have nothing to do. This idle region is the bubble. More micro batches per step shrink the bubble fraction, raising utilization toward full.

  • Bubble overhead falls as micro batch count rises.
  • Too many micro batches can hurt the batch statistics.
  • Schedules like one forward one backward reduce peak memory.

Stage flow

By keeping every stage fed with micro batches, the pipeline approaches the throughput of having no idle stages.

Key idea

Pipeline parallelism stages the model across devices and streams micro batches through them, shrinking the idle bubble so model parallel hardware stays busy.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the pipeline bubble?

2. How do you reduce the bubble fraction?

3. What can too many micro batches harm?