The Watermarking of Generated Text

Marking AI output

Watermarking embeds a subtle statistical signal in generated text so a detector with a secret key can later tell that a model produced it, without changing the visible meaning.

A common scheme

At each step, a key based pseudorandom function splits the vocabulary into a green list and a red list.
The model is nudged to prefer green tokens, slightly raising their sampling probability.
A detector counts the fraction of green tokens. Far more green than chance indicates a watermark.

Why it is useful

Helps platforms flag synthetic content for provenance and abuse detection.
The signal is statistical, so short texts are unreliable but long texts are detectable.

Limits and attacks

Paraphrasing or heavy editing can wash out the signal.
Mixing human and model text dilutes detectability.
A leaked key lets attackers forge or remove watermarks.
There is a trade off between robustness and text quality, since stronger nudging distorts output.

Key idea

Text watermarking biases generation toward a key defined green token list so a keyed detector can spot machine text statistically, but paraphrasing, mixing, and key leakage limit its robustness and trade against quality.

The Watermarking of Generated Text

Marking AI output

A common scheme

Why it is useful

Limits and attacks

Key idea

Check yourself