Why memory is hard
A context window is finite, so an agent cannot keep everything in view. Memory systems decide what to keep in context, what to push to external storage, and how to bring relevant facts back when needed.
Layers of memory
- Working memory the current context window, recent turns and the active task
- Short term buffer a summary of the running session
- Long term store facts saved across sessions, often in a vector or key value store
- Retrieval pulling the right memories back into context on demand
The flow
When context fills, older turns are summarized or written to the long term store. Before each step, the agent retrieves memories relevant to the current goal.
Design tension
Too little memory and the agent repeats itself or forgets goals. Too much and the context fills with noise, raising cost and confusing the model. Good systems retrieve selectively and summarize aggressively.
Key idea
Memory systems extend an agent past its context window by summarizing, storing externally, and retrieving only what the current step needs.