Sanitization Libraries

How tools like a HTML sanitizer parse untrusted markup and strip dangerous elements before it reaches the DOM.

What it is

A sanitization library takes untrusted HTML and returns a safe subset. It is used when you genuinely must render user supplied rich text, such as comments or formatted notes.

How it works

It parses the input into a DOM tree rather than using regular expressions.
It walks the tree and keeps only an allowlist of safe tags and attributes.
It removes script tags, event handler attributes, and dangerous URLs.

Why parsing matters

Naive string filtering is easily bypassed with tricks like broken tags or encoded characters. A real parser sees the markup the way the browser will, so the cleaned output matches reality.

Using it safely

Run sanitization at render time in the same context where output appears.
Pair it with Trusted Types so the sanitizer is the policy.
Keep the library updated since bypasses are patched over time.

Key idea

A sanitizer parses untrusted HTML and keeps only an allowlist, removing scripts and handlers before the markup hits the DOM.

Sanitization Libraries

What it is

How it works

Why parsing matters

Using it safely

Key idea

Check yourself