← Lessons

quiz vs the machine

Gold1400

Concurrency

Parallel Map Reduce

Map independently in parallel, then reduce with an associative combine.

5 min read · core · beat Gold to climb

Parallel Map Reduce

Parallel map reduce splits a collection into chunks, applies a map to each element in parallel, then folds the results with a reduce. It underlies parallel streams, big data engines, and many GPU kernels.

The map step is embarrassingly parallel because each element transforms independently. The reduce step is where care is needed. For safe parallel reduction the combine operation must be associative, meaning grouping does not change the result. Sum and max are associative; subtraction is not. Associativity lets the runtime combine partial results in any tree shape across threads.

A second property, having an identity element such as zero for sum, lets empty chunks fold cleanly and lets the framework start each chunk from a neutral value.

  • Map Transform each element with a pure function.
  • Combine Merge two partial results associatively.
  • Reduce Fold all partials into one answer.

Watch out for hidden state. A reduction that mutates a shared accumulator instead of returning a new value introduces races. Frameworks favor side effect free functions for exactly this reason. When the combine is cheap and elements are many, parallel map reduce scales nearly linearly with cores.

Key idea

Map elements in parallel, then reduce with an associative combine and identity so partial results merge safely in any order.

Check yourself

Answer to earn rating on the learn ladder.

1. Why must the reduce combine be associative for parallel reduction?

2. Which operation is unsafe as a parallel reduce combine?