Marking AI output
Watermarking embeds a subtle statistical signal in generated text so a detector with a secret key can later tell that a model produced it, without changing the visible meaning.
A common scheme
- At each step, a key based pseudorandom function splits the vocabulary into a green list and a red list.
- The model is nudged to prefer green tokens, slightly raising their sampling probability.
- A detector counts the fraction of green tokens. Far more green than chance indicates a watermark.
Why it is useful
- Helps platforms flag synthetic content for provenance and abuse detection.
- The signal is statistical, so short texts are unreliable but long texts are detectable.
Limits and attacks
- Paraphrasing or heavy editing can wash out the signal.
- Mixing human and model text dilutes detectability.
- A leaked key lets attackers forge or remove watermarks.
- There is a trade off between robustness and text quality, since stronger nudging distorts output.
Key idea
Text watermarking biases generation toward a key defined green token list so a keyed detector can spot machine text statistically, but paraphrasing, mixing, and key leakage limit its robustness and trade against quality.