← Lessons

quiz vs the machine

Silver1120

Machine Learning

The Policy and Value Function

How an agent decides what to do and how good a state is.

4 min read · intro · beat Silver to climb

The Policy and Value Function

Two central objects in reinforcement learning are the policy, which says what to do, and the value function, which says how good a situation is.

The policy

A policy maps states to actions. It can be deterministic, always choosing one action per state, or stochastic, giving a probability over actions. The goal of learning is to find a policy that earns high return.

The value function

A value function estimates expected future return.

  • The state value is the expected return starting from a state and following the policy.
  • The action value, often called Q, is the expected return after taking a specific action in a state and then following the policy.

Values let the agent compare situations and choices without simulating the whole future every time.

How they connect

Given a value function the agent can improve its policy by preferring high value actions. Given a policy the agent can estimate its values. This back and forth between evaluating and improving is the engine behind most RL methods.

A policy that is greedy with respect to accurate action values is optimal, since it always picks the choice with the highest expected return.

Key idea

The policy decides actions while the value function scores expected return, and the two refine each other toward optimal behavior.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a policy map?

2. What does the action value Q estimate?