The Work Stealing Runtime Deep

Keeping every core busy

A multi threaded async runtime runs one worker per core, each with its own local queue of tasks. The challenge is imbalance: one worker can pile up tasks while another sits idle. Work stealing fixes this.

How stealing works

Each worker pushes and pops tasks from its own local queue, usually from one end, which keeps the hot path lock free.
When a worker empties its local queue, it becomes a thief.
The thief picks another worker at random and steals tasks from the opposite end of that worker's queue.

Stealing from the far end reduces contention, because the owner and the thief touch different ends of the same queue.

Why it scales

Most of the time workers run their own tasks with no coordination, so there is no global lock on the common path. Stealing only kicks in when a core would otherwise be idle. This gives good load balancing with low overhead, which is why it underpins many modern runtimes and parallel task libraries.

Key idea

Work stealing gives each core a local queue for a lock free common path, and lets idle workers steal from the far end of a busy worker's queue to balance load with minimal contention.

The Work Stealing Runtime Deep

Keeping every core busy

How stealing works

Why it scales

Key idea

Check yourself