What it is
A sanitization library takes untrusted HTML and returns a safe subset. It is used when you genuinely must render user supplied rich text, such as comments or formatted notes.
How it works
- It parses the input into a DOM tree rather than using regular expressions.
- It walks the tree and keeps only an allowlist of safe tags and attributes.
- It removes script tags, event handler attributes, and dangerous URLs.
Why parsing matters
Naive string filtering is easily bypassed with tricks like broken tags or encoded characters. A real parser sees the markup the way the browser will, so the cleaned output matches reality.
Using it safely
- Run sanitization at render time in the same context where output appears.
- Pair it with Trusted Types so the sanitizer is the policy.
- Keep the library updated since bypasses are patched over time.
Key idea
A sanitizer parses untrusted HTML and keeps only an allowlist, removing scripts and handlers before the markup hits the DOM.