← Lessons

quiz vs the machine

Gold1430

Frontend

Sanitizing User HTML

Allow rich user markup safely by stripping dangerous tags and attributes with a vetted sanitizer.

5 min read · core · beat Gold to climb

When you must render HTML

Sometimes you genuinely need to render user supplied HTML, such as comments with formatting. Plain text encoding would destroy the formatting, so instead you sanitize, which means parsing the markup and removing anything that could execute. The goal is to keep safe tags while dropping scripts and dangerous attributes.

  • Parse the input into a document tree first.
  • Keep an allow list of safe tags and attributes.
  • Remove scripts, event handlers, and risky urls.

Use a vetted library

Writing a sanitizer with regular expressions is a known trap because attackers hide payloads in malformed markup and unusual encodings. A maintained sanitizer understands the parsing quirks browsers apply and closes the gaps you would miss.

  • Prefer an allow list over a block list approach.
  • Strip on event handlers and javascript scheme urls.
  • Run the sanitizer on the same data the browser will render.

Sanitizing pairs well with a content security policy so a slipped tag still cannot load attacker script.

Key idea

To render user HTML safely, parse and sanitize it with a vetted allow list library rather than block listing tags with regular expressions.

Check yourself

Answer to earn rating on the learn ladder.

1. Why prefer a vetted sanitizer over regular expressions?

2. Which approach is safest for choosing what to keep?