The Flat Combining

Letting one thread batch and apply everyone else's operations to cut synchronization cost.

Synchronization is the cost

For many data structures the expensive part is not the work itself but the synchronization around it. If a hundred threads each fight for the same lock or cache line, the contention dwarfs the actual updates. Flat combining flips this by serializing the work cheaply through one thread.

Publish, then let a combiner act

Each thread owns a slot in a publication list where it posts its requested operation:

A thread publishes its request and then tries to grab the role of combiner
Whichever thread wins the combiner role scans the publication list and applies every posted request to the structure itself
Other threads simply spin on their own slot until the combiner marks their request done

Why batching helps

The combiner touches the structure single threaded, so there is no internal locking or cache line ping pong during the batch. It also gains a chance to optimize across requests, for instance collapsing a push and pop, or applying many updates while the data is hot in its cache. The cost of synchronization is amortized over the whole batch.

Key idea

Flat combining has one thread batch and apply everyone's published operations single threaded, amortizing synchronization cost and enabling cross request optimization.

The Flat Combining

Synchronization is the cost

Publish, then let a combiner act

Why batching helps

Key idea

Check yourself