A budget to spend
The generator can only read so many tokens at once. After retrieval and reranking you usually have more passages than will fit, so context packing decides which passages enter the prompt and in what order, spending a fixed token budget wisely.
What to consider
- Order matters. Models attend unevenly, often paying most attention to the start and end of the context, a pattern called lost in the middle. Placing the most relevant passages at the edges helps.
- Diversity matters. Filling the budget with near duplicate chunks wastes space; keeping distinct passages covers more of the answer.
- Headroom matters. Leave room for the question, instructions, and the model's reply.
Practical packing
A common recipe takes the reranked list, drops near duplicates, trims each passage to its relevant span, and arranges the strongest passages where the model attends most. Some systems add short source labels so the model can cite cleanly.
Why it matters
Stuffing every passage in does not help; beyond a point extra context distracts the model and dilutes the key evidence. Deliberate packing keeps the strongest support where the model will actually read it.
Key idea
Context packing spends a fixed token budget by deduplicating, trimming, and ordering passages so the strongest evidence sits where the model attends most rather than lost in the middle.