← Lessons

quiz vs the machine

Platinum1820

Machine Learning

The Watermarking of Generated Text

How a hidden statistical signal can mark text as machine generated.

6 min read · advanced · beat Platinum to climb

Marking AI output

Watermarking embeds a subtle statistical signal in generated text so a detector with a secret key can later tell that a model produced it, without changing the visible meaning.

A common scheme

  • At each step, a key based pseudorandom function splits the vocabulary into a green list and a red list.
  • The model is nudged to prefer green tokens, slightly raising their sampling probability.
  • A detector counts the fraction of green tokens. Far more green than chance indicates a watermark.

Why it is useful

  • Helps platforms flag synthetic content for provenance and abuse detection.
  • The signal is statistical, so short texts are unreliable but long texts are detectable.

Limits and attacks

  • Paraphrasing or heavy editing can wash out the signal.
  • Mixing human and model text dilutes detectability.
  • A leaked key lets attackers forge or remove watermarks.
  • There is a trade off between robustness and text quality, since stronger nudging distorts output.

Key idea

Text watermarking biases generation toward a key defined green token list so a keyed detector can spot machine text statistically, but paraphrasing, mixing, and key leakage limit its robustness and trade against quality.

Check yourself

Answer to earn rating on the learn ladder.

1. How does a green list watermark mark generated text?

2. Which attack most easily weakens a text watermark?

3. What trade off does stronger watermarking face?