What are HTML Entities?
HTML entities are special codes used to represent characters that have special meaning in HTML or that cannot be easily typed on a keyboard. They begin with an ampersand (&) and end with a semicolon (;), with either a name or number in between.
For example, the less-than sign (<) cannot be written directly in HTML because browsers would interpret it as the start of a tag. Instead, we use < to display a literal less-than sign.
Why HTML Entity Encoding Matters
HTML entity encoding serves several critical purposes in web development:
Security (XSS Prevention)
Cross-Site Scripting (XSS) attacks occur when malicious scripts are injected into web pages. By encoding special characters like < and > in user input, you prevent attackers from injecting executable HTML or JavaScript code.
Displaying Reserved Characters
HTML reserves certain characters for its syntax. To display these characters as text (like < > & "), you must encode them so browsers don't misinterpret them as HTML markup.
Special Symbols and Typography
HTML entities provide access to symbols not found on standard keyboards: copyright (©), trademark (TM), mathematical symbols (±, ×, ÷), currency symbols (€, £, ¥), and typographic marks (, , ).
Character Encoding Safety
When you're unsure about the character encoding of a document, using numeric entities ensures characters display correctly regardless of encoding settings.
Types of HTML Entities
Named Entities
Human-readable names for common characters: & for &, < for <, © for ©. There are over 2,000 named entities defined in HTML5.
Decimal Numeric Entities
Use the decimal Unicode code point: < for < (code point 60 decimal). Works for any Unicode character.
Hexadecimal Numeric Entities
Use the hexadecimal code point: < for < (code point 3C hex). Often preferred for Unicode values which are commonly written in hex.
Essential HTML Entities
The five characters that must always be encoded in HTML content:
- & - Ampersand: & or &
- < - Less than: < or <
- > - Greater than: > or >
- " - Double quote: " or "
- ' - Single quote: ' or '
Common Mistakes
Double Encoding
Encoding an already-encoded string turns < into &lt;, which displays as < instead of <.
Missing Semicolons
Entities must end with a semicolon. © without the semicolon may not be interpreted correctly.
Inconsistent Encoding
Encoding some special characters but not others can still leave security vulnerabilities.
Privacy and Security
All encoding and decoding happens entirely in your browser. Your content never leaves your computer, making this tool safe for processing sensitive HTML content.
Common Use Cases
XSS Prevention
Encode user-generated content before displaying it in HTML to prevent cross-site scripting attacks.
Displaying Code Snippets
Encode HTML code examples so they display as text rather than being rendered as HTML.
Email Template Safety
Encode special characters in email templates to ensure they render correctly across different email clients.
CMS Content Cleanup
Decode entity-encoded content from CMS systems that over-encode text for editing.
Special Symbol Insertion
Convert symbols like copyright, trademark, or currency signs to their HTML entity equivalents.
Character Encoding Debugging
Decode garbled text that contains HTML entities to see the original intended characters.
Worked Examples
Encode for HTML Display
Input
<script>alert("XSS")</script>Output
<script>alert("XSS")</script>
The script tags and quotes are encoded, making this safe to display in HTML. The browser will show the text rather than execute it.
Decode Entities
Input
© 2024 — All Rights Reserved ™
Output
© 2024 — All Rights Reserved TM
Named entities are converted back to their character equivalents: copyright symbol, em dash, and trademark.
Frequently Asked Questions
What is the difference between named and numeric entities?
Named entities use readable names (© for ©) while numeric entities use Unicode code points (© or ©). Both produce the same result, but named entities are more readable and numeric entities work for any Unicode character.
Should I encode all text for HTML?
Only characters with special HTML meaning need encoding: & < > " '. Regular letters, numbers, and most punctuation are safe. Over-encoding makes content harder to read and maintain.
What about non-breaking spaces?
Non-breaking spaces ( ) prevent line breaks between words and add visible space. They are useful for formatting but should not be used excessively as they can cause accessibility issues.
How do I choose between named and numeric entities?
Use named entities for common characters (they are more readable). Use numeric entities for characters without named equivalents or when you need guaranteed compatibility across all systems.
Is my content sent to any server?
No, all encoding and decoding happens locally in your browser using JavaScript. Your content never leaves your device, making it safe to process sensitive data.
Can this tool help prevent XSS attacks?
Encoding user input is one layer of XSS prevention. Use this tool to encode content before displaying it in HTML. However, proper security requires encoding at the right point in your application, not just as a one-time conversion.
