Why guardrails
An agent that can act in the world can also act wrongly. Guardrails are checks that sit around the agent, validating inputs, constraining actions, and screening outputs before anything reaches a user or a system.
Layers of defense
- Input filters block prompt injection and unsafe requests before the model sees them
- Action allowlists restrict which tools and arguments are permitted
- Output checks screen results for leaks, toxicity, or policy violations
- Confirmation gates require approval for high impact actions
Where checks sit
Guardrails wrap the loop. Each tool call passes through a policy check, and each final output is screened before it leaves.
Practical guidance
Do not rely on the model to police itself, enforce limits in code outside the model. Make irreversible actions like deletes or payments require explicit confirmation. And log every blocked action so you can tune the policy over time.
Key idea
Guardrails enforce safety in code around the agent, filtering inputs, constraining actions, and screening outputs so an autonomous loop cannot do harm unchecked.