Embedding versus Referencing in Documents

You can nest related data inside a document or point to it elsewhere, and the choice shapes read and write cost.

Two Ways to Model Relationships

When data relates to other data, a document store offers two strategies. You can embed the related data inside the parent document, or you can reference it by storing an identifier that points to a separate document.

Embedding

Related data lives inside the parent, so one read returns everything.
Best when the child data is owned by the parent and read together, like order line items.
The cost is that large or unbounded arrays make documents grow, and updates must rewrite the whole document.

Referencing

The parent stores an id and the child lives in its own document.
Best when data is shared, large, or changes independently, like a user referenced by many posts.
The cost is extra lookups, since reading the parent then the child means more round trips or an application side join.

Choosing Between Them

If you almost always read the data together and it is bounded, embed.
If the data is shared, grows without limit, or updates on its own, reference.
Many real models mix both, embedding small stable data and referencing large shared data.

Key idea

Embedding keeps related data together for fast single reads, while referencing keeps shared or growing data separate at the cost of extra lookups.

Embedding versus Referencing in Documents

Two Ways to Model Relationships

Embedding

Referencing

Choosing Between Them

Key idea

Check yourself