What Is Reinforcement Learning
Reinforcement learning teaches an agent to make a sequence of decisions by interacting with an environment. There is no labeled answer for each step. Instead the agent receives a reward signal that tells it how good its recent actions were.
The loop is simple to state. The agent observes a state, chooses an action, and the environment returns a new state plus a reward. Over many episodes the agent learns a policy, a strategy for choosing actions that maximizes total reward over time.
Two ideas make this hard:
- Delayed reward means a good move may only pay off many steps later
- Exploration versus exploitation forces a balance between trying new actions and repeating known good ones
Because feedback is sparse and delayed, the agent must learn to assign credit across long action chains. This is why reinforcement learning differs sharply from supervised learning, where the correct answer arrives immediately.
It shines in games, robotics, and control problems where the right action depends on consequences that unfold over time.
Key idea
Reinforcement learning trains an agent to choose actions that maximize cumulative reward through trial and error in an environment.