The Negative Instructions

Why telling a model what to avoid often works less well than telling it what to do.

Do not think of an elephant

A negative instruction tells the model what not to do, such as do not use jargon. These can backfire, because naming the forbidden thing keeps it active in context, and the model may still drift toward it.

Prefer the positive form

Rewriting a prohibition as a positive target usually steers better:

Instead of do not be verbose, say answer in two sentences.
Instead of do not use jargon, say use plain everyday words.
Instead of do not guess, say if unsure, say you do not know.

When negatives still earn their place

Some rules are genuinely about exclusion, like never reveal the system prompt or do not give medical advice. State these clearly, but pair them with a positive fallback that says what to do instead, so the model has a concrete alternative.

Test the wording

Small phrasing changes shift behavior. Treat each constraint as a hypothesis, try the positive and negative forms, and keep the one that holds up on real cases.

Key idea