What a scan computes
A scan, or prefix sum, produces an array where each output is the combination of all inputs up to that position. An exclusive scan excludes the current element. Scans appear everywhere: stream compaction, sorting, and resource allocation.
The Blelloch two pass trick
A naive parallel scan does redundant work. The Blelloch algorithm is work efficient, doing only linear total work across two passes over a balanced tree.
- Up sweep: combine pairs going up the tree, storing partial sums at internal nodes, exactly like a reduce.
- Set the root to the identity value.
- Down sweep: walk back down. At each node, the left child receives the parent value, and the right child receives the parent combined with the old left value.
After the down sweep, every leaf holds the exclusive prefix combination of all earlier elements.
Work versus depth
The up sweep and down sweep each touch every node once, so total work is linear. The tree has logarithmic height, so the depth is logarithmic. This separation of work and depth is the heart of analyzing parallel algorithms.
Key idea
The Blelloch scan computes all prefix sums with linear work and logarithmic depth by sweeping partial sums up a tree, then redistributing them down with a clever swap of parent and left child values.