← Lessons

quiz vs the machine

Gold1390

System Design

Data Placement and Locality

Putting data near where it is used and across the right failure domains.

5 min read · core · beat Gold to climb

Where data lives matters

Data placement decides which node, rack, or region holds each piece of data and its replicas. Good placement cuts latency, balances load, and ensures replicas survive correlated failures.

Locality reduces latency

Locality means keeping data close to the work that uses it. Placing a user's data in the region they connect from shaves off cross continent round trips. Co locating related data, such as a user's records together, lets one node answer a query without fanning out.

Spreading across failure domains

Replicas should land in different failure domains, meaning separate racks, power zones, or regions. If all replicas share a rack, one rack outage loses them all. Placement policy therefore pushes replicas apart even as it keeps the primary copy near its users.

Balancing the tension

  • Closeness lowers latency but can concentrate copies.
  • Spread improves durability but can add distance.

A placement engine weighs these, often keeping one replica local for fast reads while spreading others wide for safety, and rebalances when nodes join, leave, or grow hot.

Key idea

Data placement balances locality, which keeps data near its users for low latency, against spreading replicas across failure domains for durability, rebalancing as the cluster and load change.

Check yourself

Answer to earn rating on the learn ladder.

1. Why place a replica in a different rack or region from the primary?

2. What does locality primarily improve?