Parallel Prefix Sum
A prefix sum, also called a scan, replaces each element with the sum of all elements up to and including it. Serially this is one pass, but each output depends on the one before it, which looks stubbornly sequential.
Breaking the dependency
The classic work efficient scan uses two passes over a balanced tree:
- The up sweep builds partial sums upward, just like a reduction, leaving subtree totals at internal nodes.
- The down sweep pushes a running offset back down, combining it with the stored partials to produce each prefix.
This performs work proportional to the array length while reaching a depth of only log of the length, so it parallelizes well.
Inclusive versus exclusive
An inclusive scan includes the current element in its output, while an exclusive scan excludes it and starts from an identity value. Many algorithms such as stream compaction and radix sort use exclusive scans to compute output positions.
Key idea
A two phase up sweep and down sweep turns the serial looking scan into a logarithmic depth parallel algorithm with linear work.