Data Placement and Locality

Where data lives matters

Data placement decides which node, rack, or region holds each piece of data and its replicas. Good placement cuts latency, balances load, and ensures replicas survive correlated failures.

Locality reduces latency

Locality means keeping data close to the work that uses it. Placing a user's data in the region they connect from shaves off cross continent round trips. Co locating related data, such as a user's records together, lets one node answer a query without fanning out.

Spreading across failure domains

Replicas should land in different failure domains, meaning separate racks, power zones, or regions. If all replicas share a rack, one rack outage loses them all. Placement policy therefore pushes replicas apart even as it keeps the primary copy near its users.

Balancing the tension

Closeness lowers latency but can concentrate copies.
Spread improves durability but can add distance.

A placement engine weighs these, often keeping one replica local for fast reads while spreading others wide for safety, and rebalances when nodes join, leave, or grow hot.

Key idea

Data placement balances locality, which keeps data near its users for low latency, against spreading replicas across failure domains for durability, rebalancing as the cluster and load change.

Data Placement and Locality

Where data lives matters

Locality reduces latency

Spreading across failure domains

Balancing the tension

Key idea

Check yourself