HTML Entity Encoder & Decoder — Named, Decimal, Hex

Guide

About HTML Entity Encoder

Encode reserved characters as named, numeric (decimal), or hex entities. Decode handles all three forms cleanly. Optional non-ASCII-only encoding for compact output that still survives legacy charsets. Useful when embedding user content in HTML, when parsing scraped pages, and when sanitizing strings against XSS.

Why HTML entities still exist

HTML conflates text content with markup using a small set of reserved characters: <, >, &, ", '. Any of those in your data must be escaped or the parser will read it as part of the markup. Entity encoding is the bridge — < becomes < and parses as the literal less-than character.

The original spec was ASCII; entities also let you embed any Unicode codepoint (😀 for 😀) when the page’s encoding could not represent it directly. Modern UTF-8 pages rarely need this, but the option survives for legacy systems.

Encoding modes

Named (&, <, >) — readable, the default for hand-written HTML.
Decimal (&, <) — universal, every parser since 1995 reads it.
Hex (&, <) — preferred by XML, used in some security configs.
Non-ASCII only — leave ASCII unchanged, encode only characters above 127. Compact, safe.

Common workflows

Sanitize user input for HTML output. Encode user-supplied strings before injecting into a template. Modern frameworks do this for you; for raw HTML generation, this tool is a backstop.

Decode scraped content. Pages scraped through HTTP clients arrive with entities intact. Decode here to get the plain text — useful for analysis or NLP preprocessing.

Verify your escaping. Paste suspected XSS input, see exactly what the encoder outputs. If <script> survives, your encoder is wrong.

Embed code snippets in HTML. Pasting <div> into an HTML page would render. Encoded <div> shows as text.

Why entities defeat XSS

XSS attacks rely on user content being interpreted as markup. If the user’s <script>alert(1)</script> becomes <script>alert(1)</script> in the HTML, the browser sees text, not code. Escape consistently and the attack surface evaporates. Forget once and a single field becomes injectable. The mechanism is simple; discipline is everything.

Frequently asked questions

Which characters must be encoded?

Five always — & as &, < as <, > as >, " as ", ' as '. Inside attribute values, quote handling is mandatory; in regular content, > and " are sometimes optional.

Named, decimal, or hex?

Named (&) is most readable. Decimal (&) is the most compatible — every parser since the 1990s reads it. Hex (&) is what XML output prefers. Pick by audience.

Is this enough to prevent XSS?

For text content, yes — escaping the five reserved characters blocks injection. For attributes, you also need to quote attribute values consistently and avoid building inline JavaScript or CSS via concatenation. Use a templating engine that escapes by default.

How do I encode emoji?

Emoji and other non-BMP characters encode as numeric entities with the full codepoint: 😀 for 😀. Most modern parsers also accept the raw UTF-8 byte sequence directly.

Why does <code>&nbsp;</code> matter?

Non-breaking space — keeps two words on the same line. Common in typography ("Mr. Smith" should not break). Your editor probably shows it identically to a regular space, which is what makes it a debugging headache.

Can I decode whole HTML pages?

Yes. Paste an entire page, decode mode strips entities to plain text. Useful when extracting content from scraped sources where entities slipped past a converter.

Related tools

Last updated: 2025-01-15