← Lessons

quiz vs the machine

Gold1340

Frontend

Sanitization Libraries

How tools like a HTML sanitizer parse untrusted markup and strip dangerous elements before it reaches the DOM.

5 min read · core · beat Gold to climb

What it is

A sanitization library takes untrusted HTML and returns a safe subset. It is used when you genuinely must render user supplied rich text, such as comments or formatted notes.

How it works

  • It parses the input into a DOM tree rather than using regular expressions.
  • It walks the tree and keeps only an allowlist of safe tags and attributes.
  • It removes script tags, event handler attributes, and dangerous URLs.

Why parsing matters

Naive string filtering is easily bypassed with tricks like broken tags or encoded characters. A real parser sees the markup the way the browser will, so the cleaned output matches reality.

Using it safely

  • Run sanitization at render time in the same context where output appears.
  • Pair it with Trusted Types so the sanitizer is the policy.
  • Keep the library updated since bypasses are patched over time.

Key idea

A sanitizer parses untrusted HTML and keeps only an allowlist, removing scripts and handlers before the markup hits the DOM.

Check yourself

Answer to earn rating on the learn ladder.

1. Why do good sanitizers parse instead of using regular expressions?

2. When should sanitization run?