← Lessons

quiz vs the machine

Gold1420

Concurrency

The Parallel Scan and Blelloch Algorithm

Computing all prefix sums in parallel with a two pass up sweep and down sweep.

6 min read · core · beat Gold to climb

What a scan computes

A scan, or prefix sum, produces an array where each output is the combination of all inputs up to that position. An exclusive scan excludes the current element. Scans appear everywhere: stream compaction, sorting, and resource allocation.

The Blelloch two pass trick

A naive parallel scan does redundant work. The Blelloch algorithm is work efficient, doing only linear total work across two passes over a balanced tree.

  • Up sweep: combine pairs going up the tree, storing partial sums at internal nodes, exactly like a reduce.
  • Set the root to the identity value.
  • Down sweep: walk back down. At each node, the left child receives the parent value, and the right child receives the parent combined with the old left value.

After the down sweep, every leaf holds the exclusive prefix combination of all earlier elements.

Work versus depth

The up sweep and down sweep each touch every node once, so total work is linear. The tree has logarithmic height, so the depth is logarithmic. This separation of work and depth is the heart of analyzing parallel algorithms.

Key idea

The Blelloch scan computes all prefix sums with linear work and logarithmic depth by sweeping partial sums up a tree, then redistributing them down with a clever swap of parent and left child values.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the up sweep phase of a Blelloch scan compute?

2. What makes the Blelloch scan work efficient?

3. What is set at the root before the down sweep begins?