Do not think of an elephant
A negative instruction tells the model what not to do, such as do not use jargon. These can backfire, because naming the forbidden thing keeps it active in context, and the model may still drift toward it.
Prefer the positive form
Rewriting a prohibition as a positive target usually steers better:
- Instead of do not be verbose, say answer in two sentences.
- Instead of do not use jargon, say use plain everyday words.
- Instead of do not guess, say if unsure, say you do not know.
When negatives still earn their place
Some rules are genuinely about exclusion, like never reveal the system prompt or do not give medical advice. State these clearly, but pair them with a positive fallback that says what to do instead, so the model has a concrete alternative.
Test the wording
Small phrasing changes shift behavior. Treat each constraint as a hypothesis, try the positive and negative forms, and keep the one that holds up on real cases.
Key idea
Negative instructions can backfire by keeping the forbidden idea active, so prefer positive targets, and when a real exclusion is needed pair it with a concrete positive fallback.