← Lessons

quiz vs the machine

Platinum1800

Machine Learning

The Guardrails In Prompts

Building safety and scope limits into prompts, and knowing where prompts alone fall short.

6 min read · advanced · beat Platinum to climb

Rules the model should hold

Guardrails are the constraints that keep a model inside safe and intended behavior, such as refusing harmful requests, staying on topic, and never leaking secrets. Prompt level guardrails state these as standing rules, usually in the system prompt.

What prompt guardrails cover

  • Scope that defines what the assistant will and will not do.
  • Refusal policy for requests it should decline, with a graceful response.
  • Output limits like no personal data or no medical diagnosis.
  • Tone and safety rules that hold across every turn.

Make them concrete

Vague rules leak. Pair each prohibition with a positive fallback that says what to do instead, give a short refusal template, and use delimiters so user text cannot pose as a new instruction. Place durable rules in the system prompt where they take precedence.

Defense in depth

Prompts alone are not a hard boundary, since a determined user may craft an injection. Treat prompt guardrails as one layer, backed by input and output filters, allow lists, and monitoring, so a single bypass does not defeat the whole system.

Key idea

Prompt guardrails state scope, refusal, and output limits as standing rules, made concrete with fallbacks and delimiters, but they are one layer in defense in depth rather than a hard boundary on their own.

Check yourself

Answer to earn rating on the learn ladder.

1. Where do durable guardrail rules best live?

2. Why are prompt guardrails not a hard boundary?

3. How do you make a prohibition more robust?