Deep Q Networks

A deep Q network, or DQN, scales Q learning to problems where a table of values is impossible, such as learning from raw game pixels. A neural network approximates the action value function.

The function approximator

Instead of one entry per state action pair, the network takes a state as input and outputs a Q value for each action. This lets it generalize across similar states it has never seen exactly, which is essential for huge or continuous state spaces.

Why naive DQN is unstable

Plugging a network into the Q learning update is unstable for two reasons:

Consecutive samples are highly correlated, breaking the assumption of independent data.
The bootstrap target shifts every time the network updates, so the model chases a moving target.

The two key fixes

DQN adds two ingredients that make training stable:

Experience replay stores transitions and samples them randomly, decorrelating the data.
A target network, a periodically frozen copy of the weights, provides stable targets that change slowly.

The training loop

The agent acts with epsilon greedy exploration, stores each transition, samples a minibatch from replay, and minimizes the difference between predicted Q and the target network's bootstrapped value. This combination let DQN reach human level play on many Atari games.

Key idea

A deep Q network approximates action values with a neural network, using experience replay and a target network to make Q learning stable at scale.

Deep Q Networks

Deep Q Networks

The function approximator

Why naive DQN is unstable

The two key fixes

The training loop

Key idea

Check yourself