Parallel Sum Reduction
A reduction folds an array of values into a single result using an associative operator such as addition. Done serially it adds one element at a time and takes time proportional to the array length. In parallel we can do far better.
The tree of adds
Because addition is associative, we can group the additions in any order. Pair up neighbors and add each pair at the same time:
- The first round halves the count of live values.
- Each later round halves what remains again.
- After about log of the length rounds, a single total survives.
With enough processors this turns a long serial chain into a shallow tree of depth log of the length.
Why associativity matters
The operator must be associative so that regrouping does not change the answer. Floating point addition is only approximately associative, so a parallel sum may differ slightly from a serial one in the last digits.
Key idea
An associative operator lets a reduction become a balanced tree of depth log of the length instead of a long serial chain.