The Experience Replay Buffer
The experience replay buffer is a memory of past transitions that the agent samples from when training. It is a cornerstone of stable deep reinforcement learning.
What it stores
Each step produces a transition: state, action, reward, next state, and whether the episode ended. These tuples are pushed into a fixed size buffer, usually a queue that drops the oldest entries when full.
Why it helps
Training a network on the stream of transitions in order has two problems that replay solves:
- Correlation: consecutive steps are very similar, so ordered training violates the independence neural networks rely on. Random sampling breaks this correlation.
- Sample efficiency: each transition can be reused many times instead of being seen once and discarded, squeezing more learning from costly experience.
Prioritized replay
Plain replay samples uniformly. Prioritized experience replay samples transitions with large TD error more often, focusing learning on surprising experiences where the model is most wrong. It corrects the resulting bias with importance weights.
Practical notes
The buffer size trades freshness against diversity. Too small and samples stay correlated; too large and stale data from an old policy lingers. Sizes of hundreds of thousands to millions are common.
Key idea
The replay buffer stores past transitions and samples them randomly, decorrelating training data and reusing experience for greater stability and sample efficiency.