What it is
Chunking splits long documents into smaller pieces before embedding them for retrieval. Chunk size shapes what a vector search can find and how much context an LLM receives, so it strongly affects answer quality.
The size trade off
- Tiny chunks give precise matches but may lack enough context to answer.
- Huge chunks carry full context but dilute the embedding, so the relevant sentence gets averaged away and recall drops.
The sweet spot keeps each chunk about one coherent idea.
Strategies
- Fixed size with overlap: split every N tokens with some overlap so a fact spanning a boundary is not cut.
- Semantic or structural: split on natural units like paragraphs, headings, or sentences, which keeps ideas intact.
- Parent document: embed small chunks for matching but return the larger parent section for context.
Pick the unit that matches your data: code by function, prose by paragraph, tables by row group.
Key idea
Chunking balances precision against context: chunks built around one coherent idea, with overlap or parent retrieval, retrieve far better than arbitrary large blocks.