Where data lives matters
Data placement decides which node, rack, or region holds each piece of data and its replicas. Good placement cuts latency, balances load, and ensures replicas survive correlated failures.
Locality reduces latency
Locality means keeping data close to the work that uses it. Placing a user's data in the region they connect from shaves off cross continent round trips. Co locating related data, such as a user's records together, lets one node answer a query without fanning out.
Spreading across failure domains
Replicas should land in different failure domains, meaning separate racks, power zones, or regions. If all replicas share a rack, one rack outage loses them all. Placement policy therefore pushes replicas apart even as it keeps the primary copy near its users.
Balancing the tension
- Closeness lowers latency but can concentrate copies.
- Spread improves durability but can add distance.
A placement engine weighs these, often keeping one replica local for fast reads while spreading others wide for safety, and rebalances when nodes join, leave, or grow hot.
Key idea
Data placement balances locality, which keeps data near its users for low latency, against spreading replicas across failure domains for durability, rebalancing as the cluster and load change.