The Experience Replay Buffer

The experience replay buffer is a memory of past transitions that the agent samples from when training. It is a cornerstone of stable deep reinforcement learning.

What it stores

Each step produces a transition: state, action, reward, next state, and whether the episode ended. These tuples are pushed into a fixed size buffer, usually a queue that drops the oldest entries when full.

Why it helps

Training a network on the stream of transitions in order has two problems that replay solves:

Correlation: consecutive steps are very similar, so ordered training violates the independence neural networks rely on. Random sampling breaks this correlation.
Sample efficiency: each transition can be reused many times instead of being seen once and discarded, squeezing more learning from costly experience.

Prioritized replay

Plain replay samples uniformly. Prioritized experience replay samples transitions with large TD error more often, focusing learning on surprising experiences where the model is most wrong. It corrects the resulting bias with importance weights.

Practical notes

The buffer size trades freshness against diversity. Too small and samples stay correlated; too large and stale data from an old policy lingers. Sizes of hundreds of thousands to millions are common.

Key idea

The replay buffer stores past transitions and samples them randomly, decorrelating training data and reusing experience for greater stability and sample efficiency.

The Experience Replay Buffer