← Lessons

quiz vs the machine

Silver1100

Machine Learning

The Bellman Optimality Equation

The recursive consistency condition that an optimal value function must satisfy.

5 min read · intro · beat Silver to climb

Value functions

The state value under a policy is the expected discounted return starting from that state and following the policy. The optimal value is the best achievable value from each state over all policies. Knowing the optimal values lets you act greedily and behave optimally.

The recursion

The Bellman optimality equation says the optimal value of a state equals the value of its best action. That best action value is the expected immediate reward plus the discounted optimal value of wherever you land:

  • Consider every action available in the state.
  • For each, take the expected reward plus gamma times the optimal value of the next state.
  • The optimal value of the state is the maximum over those actions.

Why it matters

This equation is a fixed point. The optimal value function is the unique solution, and most planning and learning algorithms are just different ways of solving it. The max over actions is what makes it nonlinear and distinguishes it from the plain Bellman expectation equation for a fixed policy.

Key idea

The Bellman optimality equation expresses each optimal state value as the maximum over actions of immediate reward plus the discounted optimal value of the successor, a fixed point that defines optimal behavior.

Check yourself

Answer to earn rating on the learn ladder.

1. What operation makes the Bellman optimality equation nonlinear?

2. What does the optimal value of a state equal?