← Lessons

quiz vs the machine

Platinum1750

Machine Learning

Deep Q Networks

Replacing the Q table with a neural network for large state spaces.

6 min read · advanced · beat Platinum to climb

Deep Q Networks

A deep Q network, or DQN, scales Q learning to problems where a table of values is impossible, such as learning from raw game pixels. A neural network approximates the action value function.

The function approximator

Instead of one entry per state action pair, the network takes a state as input and outputs a Q value for each action. This lets it generalize across similar states it has never seen exactly, which is essential for huge or continuous state spaces.

Why naive DQN is unstable

Plugging a network into the Q learning update is unstable for two reasons:

  • Consecutive samples are highly correlated, breaking the assumption of independent data.
  • The bootstrap target shifts every time the network updates, so the model chases a moving target.

The two key fixes

DQN adds two ingredients that make training stable:

  • Experience replay stores transitions and samples them randomly, decorrelating the data.
  • A target network, a periodically frozen copy of the weights, provides stable targets that change slowly.

The training loop

The agent acts with epsilon greedy exploration, stores each transition, samples a minibatch from replay, and minimizes the difference between predicted Q and the target network's bootstrapped value. This combination let DQN reach human level play on many Atari games.

Key idea

A deep Q network approximates action values with a neural network, using experience replay and a target network to make Q learning stable at scale.

Check yourself

Answer to earn rating on the learn ladder.

1. Why does a deep Q network use a neural network instead of a table?

2. What does experience replay fix?

3. What is the role of the target network?