Parallel Collections

Data parallelism from operations

A parallel collection runs bulk operations like map, filter, and reduce across many cores while presenting the same interface as its sequential cousin. Switching from sequential to parallel can be a one line change because the operations are already pure transformations.

Split, apply, combine

The engine follows a divide and conquer shape:

Split the collection into chunks.
Apply the operation to each chunk on a worker, often via a work stealing pool.
Combine partial results into the final answer.

For this to be correct, the combine step must be associative, and for reductions the runtime may also need an identity element so empty chunks behave.

Pitfalls that break parallelism

Side effecting operations like appending to a shared list create races; use a reduction instead.
Order dependent folds give wrong answers because chunks finish in any order.
Tiny collections lose to scheduling overhead, so a threshold falls back to sequential.

Key idea

Parallel collections split work into chunks processed on many cores, requiring pure associative operations to combine results correctly.

Parallel Collections

Data parallelism from operations

Split, apply, combine

Pitfalls that break parallelism

Key idea

Check yourself