Value and advantage
The dueling architecture rethinks the network that outputs Q values. After shared feature layers it splits into two streams:
- A value stream estimating how good the state is, independent of action.
- An advantage stream estimating how much better each action is than average.
The two streams recombine into Q values. The insight is that in many states the choice of action barely matters, so learning the state value once is far more efficient than learning a separate value for every action.
The aggregation
Naively adding value and advantage is unidentifiable, since you could shift a constant between them. Dueling DQN fixes this by subtracting the mean advantage before recombining. This anchors the decomposition and keeps training stable.
Why it helps
By sharing the value estimate across actions, the network learns the worth of states with less data, improving sample efficiency. It pairs naturally with Double DQN and prioritized replay.
Key idea
Dueling DQN factors Q values into a shared state value plus a mean centered advantage, letting the network learn state worth efficiently and improving sample efficiency in action sparse situations.