The Chunking Strategies Deep

Why chunk

Documents are too long to embed as one vector and too long to paste whole into a prompt. Chunking splits each document into smaller pieces that are embedded and retrieved independently. The size and shape of those pieces decides what the retriever can match.

Common strategies

Fixed size cuts every N characters or tokens. Simple and fast, but it can slice sentences in half.
Sentence based splits on sentence boundaries so each chunk reads cleanly.
Structure aware respects headings, paragraphs, or markdown sections so a chunk stays on one topic.

The size tradeoff

Small chunks are precise but lose surrounding context, so an answer may need several of them stitched together. Large chunks carry context but dilute the embedding, mixing several topics into one vector and lowering match quality.

Why it matters

Chunking is often the highest leverage knob in a RAG system. A bad split scatters an answer across many pieces or buries the key sentence inside a noisy block, and no reranker can fully repair it.

Key idea

Chunking decides the unit of retrieval, and the size tradeoff between precision and context makes it one of the most impactful choices in a RAG pipeline.

The Chunking Strategies Deep

Why chunk

Common strategies

The size tradeoff

Why it matters

Key idea

Check yourself