Why chunk
Documents are too long to embed as one vector and too long to paste whole into a prompt. Chunking splits each document into smaller pieces that are embedded and retrieved independently. The size and shape of those pieces decides what the retriever can match.
Common strategies
- Fixed size cuts every N characters or tokens. Simple and fast, but it can slice sentences in half.
- Sentence based splits on sentence boundaries so each chunk reads cleanly.
- Structure aware respects headings, paragraphs, or markdown sections so a chunk stays on one topic.
The size tradeoff
Small chunks are precise but lose surrounding context, so an answer may need several of them stitched together. Large chunks carry context but dilute the embedding, mixing several topics into one vector and lowering match quality.
Why it matters
Chunking is often the highest leverage knob in a RAG system. A bad split scatters an answer across many pieces or buries the key sentence inside a noisy block, and no reranker can fully repair it.
Key idea
Chunking decides the unit of retrieval, and the size tradeoff between precision and context makes it one of the most impactful choices in a RAG pipeline.