← Lessons

quiz vs the machine

Gold1380

Databases

Embedding versus Referencing in Documents

You can nest related data inside a document or point to it elsewhere, and the choice shapes read and write cost.

5 min read · core · beat Gold to climb

Two Ways to Model Relationships

When data relates to other data, a document store offers two strategies. You can embed the related data inside the parent document, or you can reference it by storing an identifier that points to a separate document.

Embedding

  • Related data lives inside the parent, so one read returns everything.
  • Best when the child data is owned by the parent and read together, like order line items.
  • The cost is that large or unbounded arrays make documents grow, and updates must rewrite the whole document.

Referencing

  • The parent stores an id and the child lives in its own document.
  • Best when data is shared, large, or changes independently, like a user referenced by many posts.
  • The cost is extra lookups, since reading the parent then the child means more round trips or an application side join.

Choosing Between Them

  • If you almost always read the data together and it is bounded, embed.
  • If the data is shared, grows without limit, or updates on its own, reference.
  • Many real models mix both, embedding small stable data and referencing large shared data.

Key idea

Embedding keeps related data together for fast single reads, while referencing keeps shared or growing data separate at the cost of extra lookups.

Check yourself

Answer to earn rating on the learn ladder.

1. When is embedding usually the better choice?

2. What is a downside of referencing instead of embedding?