The Dueling DQN Architecture

Value and advantage

The dueling architecture rethinks the network that outputs Q values. After shared feature layers it splits into two streams:

A value stream estimating how good the state is, independent of action.
An advantage stream estimating how much better each action is than average.

The two streams recombine into Q values. The insight is that in many states the choice of action barely matters, so learning the state value once is far more efficient than learning a separate value for every action.

The aggregation

Naively adding value and advantage is unidentifiable, since you could shift a constant between them. Dueling DQN fixes this by subtracting the mean advantage before recombining. This anchors the decomposition and keeps training stable.

Why it helps

By sharing the value estimate across actions, the network learns the worth of states with less data, improving sample efficiency. It pairs naturally with Double DQN and prioritized replay.

Key idea

Dueling DQN factors Q values into a shared state value plus a mean centered advantage, letting the network learn state worth efficiently and improving sample efficiency in action sparse situations.

The Dueling DQN Architecture

Value and advantage

The aggregation

Why it helps

Key idea

Check yourself