← Lessons

quiz vs the machine

Platinum1800

Concurrency

The NUMA Architecture Effects

Why memory access cost depends on which socket owns the data.

5 min read · advanced · beat Platinum to climb

Memory is not uniform

On large servers memory is split among sockets. In a non uniform memory access design each socket has its own local memory, and reaching another socket memory crosses an interconnect.

The latency gap

  • Accessing local memory is fast.
  • Accessing remote memory on another socket is slower, often noticeably so.
  • Bandwidth to remote memory is also limited by the interconnect.

So the same instruction can be cheap or expensive depending on where the data physically lives.

Designing for NUMA

Performance hinges on placement. The common policy is first touch, where a page is allocated on the socket of the thread that first writes it. To exploit this, pin threads to cores and have each thread initialize the data it will use, keeping accesses local.

Ignoring NUMA leads to all memory landing on one socket while threads spread across others, saturating one interconnect and starving the rest. Awareness turns a hidden penalty into predictable local access.

Key idea

In a NUMA system memory latency depends on which socket owns the data, so placing data near the thread that uses it through first touch and pinning is key to performance.

Check yourself

Answer to earn rating on the learn ladder.

1. What does NUMA mean for memory access?

2. What is the first touch policy?