Parallel Quicksort
Quicksort picks a pivot, partitions elements into those below and above it, then recursively sorts each side. The two recursive calls are independent and can run as parallel tasks.
Where the parallelism lives
The recursion gives easy task parallelism, but the partition step is serial in the simple version and scans every element. At the top level that serial partition touches the whole array, so it caps how fast the sort can go.
A parallel partition
A scalable quicksort also parallelizes partitioning, often with a prefix sum:
- Each chunk counts how many of its elements fall below the pivot.
- A scan turns those counts into output offsets.
- Threads scatter elements to their final partition slots in parallel.
Pivot quality still matters. A bad pivot makes one side huge and the recursion deep, hurting both serial and parallel performance, so sampling several candidates helps balance the split.
Key idea
Parallel quicksort runs both recursive sorts as tasks, and a scan based partition removes the serial partition bottleneck while good pivots keep the split balanced.