States Actions and Rewards

At every step of reinforcement learning the agent observes a state, picks an action, and receives a reward. These three signals form the loop that all learning is built on.

States

A state is the agent's view of the world at a moment. It can be a board position, a robot's joint angles, or a vector of sensor readings. A good state representation includes everything the agent needs to decide well.

Actions

An action is a choice the agent makes that affects the environment. Actions can be discrete, like moving left or right, or continuous, like setting a motor torque. The set of legal actions may depend on the state.

Rewards

A reward is a single number the environment returns after each action. It encodes the goal: higher is better. Designing rewards is delicate because the agent optimizes exactly what you reward.

Sparse rewards arrive only at rare events like winning.
Dense rewards give frequent hints but can be gamed.

The return

The agent does not maximize one reward but the return, the total reward accumulated over time. This forces it to think beyond the immediate step and value actions that pay off later.

Key idea

The agent loops through observing states, choosing actions, and collecting rewards, aiming to maximize the total return.

States Actions and Rewards