The Prioritized Experience Replay

Sampling surprising transitions more often to learn faster from a replay buffer.

Beyond uniform replay

A replay buffer stores past transitions for reuse. Plain DQN samples them uniformly, but not all transitions are equally informative. Prioritized experience replay samples transitions with large TD error more often, focusing learning on surprising experiences the agent has not yet mastered.

Priorities

Each transition gets a priority based on the magnitude of its last TD error. Sampling probability is proportional to priority raised to a tunable exponent that controls how aggressively prioritization is applied. A small constant keeps every transition reachable so none starves.

Correcting the bias

Sampling non uniformly changes the expected update and would bias learning. Prioritized replay corrects this with importance sampling weights that downweight frequently drawn transitions:

High priority transitions are sampled often but counted less per sample.
The correction is annealed toward full strength over training.

Key idea

Prioritized experience replay draws high TD error transitions more often to accelerate learning, then applies importance sampling weights to correct the bias that nonuniform sampling introduces.

The Prioritized Experience Replay

Beyond uniform replay

Priorities

Correcting the bias

Key idea

Check yourself