What it is
Output guardrails are checks applied to a model's response before it reaches a user or a downstream system. They catch unsafe, off topic, or malformed output that the model occasionally produces despite good prompting.
Kinds of guardrails
- Format validation: confirm the output matches a schema, like valid JSON with required fields.
- Content safety: scan for toxicity, personal data, or policy violations.
- Grounding checks: verify claims are supported by retrieved sources to reduce hallucination.
- Business rules: enforce limits such as never quoting a price above a cap.
Handling failures
When a guardrail trips, you choose a policy.
- Retry with a corrective instruction, useful for format errors.
- Repair the output programmatically when the fix is mechanical.
- Block and return a safe fallback for genuine safety violations.
Guardrails add latency and cost, so apply the strict ones where the blast radius is large.
Key idea
Output guardrails validate format, safety, and grounding after generation, then retry, repair, or block so bad output never reaches production.