← Lessons

quiz vs the machine

Gold1430

Machine Learning

The Context Window Packing

Fit retrieved passages into a limited prompt without burying the key one.

5 min read · core · beat Gold to climb

A budget to spend

The generator can only read so many tokens at once. After retrieval and reranking you usually have more passages than will fit, so context packing decides which passages enter the prompt and in what order, spending a fixed token budget wisely.

What to consider

  • Order matters. Models attend unevenly, often paying most attention to the start and end of the context, a pattern called lost in the middle. Placing the most relevant passages at the edges helps.
  • Diversity matters. Filling the budget with near duplicate chunks wastes space; keeping distinct passages covers more of the answer.
  • Headroom matters. Leave room for the question, instructions, and the model's reply.

Practical packing

A common recipe takes the reranked list, drops near duplicates, trims each passage to its relevant span, and arranges the strongest passages where the model attends most. Some systems add short source labels so the model can cite cleanly.

Why it matters

Stuffing every passage in does not help; beyond a point extra context distracts the model and dilutes the key evidence. Deliberate packing keeps the strongest support where the model will actually read it.

Key idea

Context packing spends a fixed token budget by deduplicating, trimming, and ordering passages so the strongest evidence sits where the model attends most rather than lost in the middle.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the lost in the middle effect?

2. Why not just stuff every retrieved passage into the prompt?