Data parallelism from operations
A parallel collection runs bulk operations like map, filter, and reduce across many cores while presenting the same interface as its sequential cousin. Switching from sequential to parallel can be a one line change because the operations are already pure transformations.
Split, apply, combine
The engine follows a divide and conquer shape:
- Split the collection into chunks.
- Apply the operation to each chunk on a worker, often via a work stealing pool.
- Combine partial results into the final answer.
For this to be correct, the combine step must be associative, and for reductions the runtime may also need an identity element so empty chunks behave.
Pitfalls that break parallelism
- Side effecting operations like appending to a shared list create races; use a reduction instead.
- Order dependent folds give wrong answers because chunks finish in any order.
- Tiny collections lose to scheduling overhead, so a threshold falls back to sequential.
Key idea
Parallel collections split work into chunks processed on many cores, requiring pure associative operations to combine results correctly.