Choosing where a thread runs
Thread affinity, also called pinning, tells the scheduler to run a given thread only on a specific core or set of cores. By default the OS is free to move a thread between cores to balance load, but pinning overrides that and holds it in place.
Why pin a thread
- Warm caches since the thread keeps reusing the same core's caches instead of arriving cold on a new core.
- Steady latency because migration stalls and cache reloads are avoided, tightening the tail.
- NUMA locality so a thread stays near the memory bank attached to its core, avoiding slower remote access.
The tradeoffs
Pinning is not free of risk. A pinned thread cannot move to an idle core when its own core is busy, so a poor pinning plan can leave some cores overloaded while others idle. It also fights the OS load balancer, which assumed it could spread work. Affinity shines in latency sensitive systems, such as trading or packet processing, where predictable cache behavior beats flexible balancing.
Key idea
Thread affinity pins a thread to chosen cores to keep caches warm and latency steady, at the cost of flexibility when a pinned core is busy while others are idle.