← Lessons

quiz vs the machine

Gold1480

Machine Learning

The Experience Replay Buffer

Storing and reusing past transitions to stabilize learning.

4 min read · core · beat Gold to climb

The Experience Replay Buffer

The experience replay buffer is a memory of past transitions that the agent samples from when training. It is a cornerstone of stable deep reinforcement learning.

What it stores

Each step produces a transition: state, action, reward, next state, and whether the episode ended. These tuples are pushed into a fixed size buffer, usually a queue that drops the oldest entries when full.

Why it helps

Training a network on the stream of transitions in order has two problems that replay solves:

  • Correlation: consecutive steps are very similar, so ordered training violates the independence neural networks rely on. Random sampling breaks this correlation.
  • Sample efficiency: each transition can be reused many times instead of being seen once and discarded, squeezing more learning from costly experience.

Prioritized replay

Plain replay samples uniformly. Prioritized experience replay samples transitions with large TD error more often, focusing learning on surprising experiences where the model is most wrong. It corrects the resulting bias with importance weights.

Practical notes

The buffer size trades freshness against diversity. Too small and samples stay correlated; too large and stale data from an old policy lingers. Sizes of hundreds of thousands to millions are common.

Key idea

The replay buffer stores past transitions and samples them randomly, decorrelating training data and reusing experience for greater stability and sample efficiency.

Check yourself

Answer to earn rating on the learn ladder.

1. What two problems does an experience replay buffer address?

2. How does prioritized experience replay choose samples?