Memory for Agents Short and Long Term
Agents need to remember, but the context window is finite. Practical systems split memory into two kinds that serve different time horizons.
Short term memory
- This is the running context window holding the current task's messages and observations.
- It is fast and fully visible to the model but capped in size.
- When it fills, old turns must be summarized or dropped, so it cannot hold everything forever.
Long term memory
Long term memory lives outside the context, usually in a database or a vector store. The agent writes durable facts there, like user preferences or earlier conclusions, and retrieves only the relevant pieces back into the context when needed. This lets knowledge persist across sessions without bloating every prompt.
The retrieval bridge
The two layers connect through retrieval. At each step the agent can pull a small, relevant slice of long term memory into short term context, use it, then let it fall away. The art is deciding what is worth writing down and what to recall, because storing everything makes retrieval noisy and recalling too much wastes the window you were trying to protect.
Key idea
Short term memory is the live context window while long term memory is an external store retrieved into context only when relevant.