← Lessons

quiz vs the machine

Gold1480

Machine Learning

The Prioritized Experience Replay

Sampling surprising transitions more often to learn faster from a replay buffer.

6 min read · core · beat Gold to climb

Beyond uniform replay

A replay buffer stores past transitions for reuse. Plain DQN samples them uniformly, but not all transitions are equally informative. Prioritized experience replay samples transitions with large TD error more often, focusing learning on surprising experiences the agent has not yet mastered.

Priorities

Each transition gets a priority based on the magnitude of its last TD error. Sampling probability is proportional to priority raised to a tunable exponent that controls how aggressively prioritization is applied. A small constant keeps every transition reachable so none starves.

Correcting the bias

Sampling non uniformly changes the expected update and would bias learning. Prioritized replay corrects this with importance sampling weights that downweight frequently drawn transitions:

  • High priority transitions are sampled often but counted less per sample.
  • The correction is annealed toward full strength over training.

Key idea

Prioritized experience replay draws high TD error transitions more often to accelerate learning, then applies importance sampling weights to correct the bias that nonuniform sampling introduces.

Check yourself

Answer to earn rating on the learn ladder.

1. How does prioritized replay choose which transitions to sample?

2. Why are importance sampling weights needed?