What Is Reinforcement Learning

Reinforcement learning teaches an agent to make a sequence of decisions by interacting with an environment. There is no labeled answer for each step. Instead the agent receives a reward signal that tells it how good its recent actions were.

The loop is simple to state. The agent observes a state, chooses an action, and the environment returns a new state plus a reward. Over many episodes the agent learns a policy, a strategy for choosing actions that maximizes total reward over time.

Two ideas make this hard:

Delayed reward means a good move may only pay off many steps later
Exploration versus exploitation forces a balance between trying new actions and repeating known good ones

Because feedback is sparse and delayed, the agent must learn to assign credit across long action chains. This is why reinforcement learning differs sharply from supervised learning, where the correct answer arrives immediately.

It shines in games, robotics, and control problems where the right action depends on consequences that unfold over time.

Key idea

Reinforcement learning trains an agent to choose actions that maximize cumulative reward through trial and error in an environment.

What Is Reinforcement Learning

What Is Reinforcement Learning

Key idea

Check yourself