You want to display a less-than sign on a web page. You type < in your HTML. The browser interprets it as the start of an HTML tag, and your page breaks. This is why HTML entity encoding exists.
Certain characters have special meaning in HTML. The less-than sign (<), greater-than sign (>), ampersand (&), and quotation marks (" and ') are part of HTML's syntax. When you want these characters to appear as text content rather than markup, you need to encode them.
HTML entity encoding replaces these special characters with entity references that browsers render as the intended character. It sounds simple, and for the most part it is. But understanding when encoding is necessary, when it is not, and how it relates to security is important for every web developer.
The Five Essential HTML Entities
These five characters must be encoded in HTML content to prevent parsing issues:
| Character | Entity Name | Entity Number | When to Encode |
|-----------|-------------|---------------|----------------|
| < | < | < | Always in text content |
| > | > | > | Always in text content |
| & | & | & | Always (it starts entity references) |
| " | " | " | Inside attribute values |
| ' | ' | ' | Inside single-quoted attributes |
The ampersand is the trickiest because it is the escape character itself. To display & literally on a page, you need to encode it as &. This recursive nature catches even experienced developers off guard.
Practically, modern browsers are forgiving about unencoded characters in most contexts. An unencoded > in text content rarely causes issues. But an unencoded < will almost always break something because the browser tries to parse it as a tag. And unencoded & can create ambiguous entity references.
The safe approach: always encode these five characters in text content. Let your templating engine or framework handle it automatically.

Named Entities, Numeric Entities, and Unicode
HTML offers three ways to represent special characters:
Named entities use memorable names: © for copyright, € for the euro sign, — for an em dash. There are about 2,200 named entities in HTML5. They are easy to read in source code but require memorization.
Numeric entities use Unicode code points: © (decimal) or © (hexadecimal) for the copyright symbol. Every Unicode character can be represented this way, even those without named entities.
Direct UTF-8 characters can be used when your document encoding is UTF-8 (which it should always be in 2026). Instead of © or ©, you can simply type the copyright symbol directly in your HTML.
So when should you use entities vs direct characters?
Use entities for the five essential characters (<, >, &, ", ') and for characters that cannot be typed on your keyboard (mathematical symbols, arrows, special punctuation).
Use direct UTF-8 for accented characters, CJK characters, emoji, and other characters you can type or paste directly. There is no technical reason to encode e as é when your document is UTF-8.
The URL Encoder handles a related but different encoding scheme. URL encoding replaces unsafe URL characters with percent-encoded values (%20 for space, %26 for ampersand). Do not confuse URL encoding with HTML entity encoding. They serve different purposes in different contexts.
HTML offers three ways to represent special characters: **Named entities** use memorable names: `©` for copyright, `€` for the euro sign, `—` for an em dash.
Security: HTML Encoding and XSS Prevention
HTML entity encoding is not just about displaying characters correctly. It is a critical security defense against cross-site scripting (XSS) attacks.
XSS attacks inject malicious HTML or JavaScript into a web page by exploiting unencoded user input. If a user submits a comment containing and your application renders it without encoding, the browser executes the script.
Proper encoding transforms the attack payload into harmless text:
- Input:
- Encoded output: <script>alert('hacked')</script>
- Displayed: (as visible text, not executed)
Rule: never insert untrusted data into HTML without encoding.
Modern frameworks handle this automatically: - React encodes all JSX expressions by default - Vue encodes template interpolations (double curly braces) by default - Angular sanitizes all template bindings by default
The dangerous APIs that bypass encoding (dangerouslySetInnerHTML in React, v-html in Vue, innerHTML in vanilla JS) should only be used with content you control completely, never with user input.
For server-side rendering, use your language's built-in encoding functions: htmlspecialchars() in PHP, html.escape() in Python, ERB::Util.html_escape() in Ruby. Never write your own encoding function.

Common Special Characters for Web Content
Beyond the essential five, here are the characters web developers use most frequently:
Typography:
- … ( ... ) Horizontal ellipsis
- ‘ ’ Smart single quotes
- “ ” Smart double quotes
- – Short dash (ranges: 2010-2020)
- • Bullet point
- · Middle dot (separators)
Currency:
- € Euro sign
- £ British pound
- ¥ Japanese yen
- ¢ Cent sign
Math and Science:
- × Multiplication sign
- ÷ Division sign
- ± Plus-minus
- ° Degree symbol
- ² ³ Superscript 2 and 3
- ½ ¼ ¾ Fractions
Arrows:
- ← → ↑ ↓ Directional arrows
Symbols:
- © Copyright
- ® Registered trademark
- ™ Trademark
- ✓ Check mark
You can use the Base64 Encoder when you need to encode binary data (like images) as text for embedding in HTML or transferring through text-based protocols.
Beyond the essential five, here are the characters web developers use most frequently: **Typography:** - `…` ( ...
Encoding in Different Contexts
The encoding rules change depending on where you are inserting content:
HTML text content (between tags): Encode <, >, and &. Quotes do not need encoding in this context.
HTML attributes: Encode " (if using double-quoted attributes) or ' (if using single-quoted). Also encode <, >, and &.
JavaScript strings embedded in HTML: Use JavaScript string escaping (backslash), not HTML entities. < inside a JavaScript string is the literal text "<", not a less-than sign.
CSS content property: Use CSS escaping (\00A9 for copyright) rather than HTML entities.
URLs: Use percent-encoding (%20 for space, %26 for ampersand), not HTML entities. But if a URL appears in an HTML attribute (href), the URL itself uses percent-encoding AND the attribute value uses HTML entity encoding for ampersands.
This last point causes real confusion. A URL with query parameters in an HTML link:
`html
Next
Next
`
The browser decodes & back to & before following the URL. If you skip the encoding, the browser might interpret &page as an entity reference.
After building your HTML, the HTML Minifier strips unnecessary whitespace and comments while preserving all entity encoding, reducing file size without breaking special characters.
Character Encoding vs HTML Entities
Character encoding (UTF-8, ISO-8859-1) and HTML entity encoding are related but different concepts.
Character encoding tells the browser how to interpret the bytes in your file. UTF-8 can represent every Unicode character. ISO-8859-1 can only represent 256 characters. Always use UTF-8. Add this to every HTML document:
`html
`
If this declaration is missing or wrong, special characters can appear as garbled text (mojibake). The character e appears as é when a UTF-8 document is interpreted as ISO-8859-1.
HTML entity encoding is a separate layer that replaces characters with entity references. You can use HTML entities regardless of the document's character encoding.
With UTF-8, you rarely need HTML entities for display purposes. You can type accented characters, Greek letters, mathematical symbols, and even emoji directly in your HTML source. Entities are still needed for the five characters with special HTML meaning and for characters you cannot easily type.
Best practice for 2026:
1. Set on every page
2. Save files in UTF-8 encoding
3. Use direct characters for everything you can type
4. Use HTML entities only for <, >, &, ", ' and characters you cannot input directly
5. Let your framework handle entity encoding for dynamic content automatically
Character encoding (UTF-8, ISO-8859-1) and HTML entity encoding are related but different concepts.
FAQ
Do I need to encode emoji in HTML?
No, if your document uses UTF-8 encoding (which it should). Emoji are Unicode characters and can be included directly in HTML source. Hello 👋 is valid UTF-8 HTML. You can also use numeric entities (👋 for the waving hand), but there is no practical reason to do so in UTF-8 documents.
Why does my page show weird characters instead of the ones I typed?
This is almost always a character encoding mismatch. Your file is saved in one encoding (often UTF-8) but the browser is interpreting it as another (often ISO-8859-1 or Windows-1252). Fix it by adding as the first element inside , and make sure your text editor saves the file as UTF-8.
Should I use named entities or numeric entities?
Named entities (©) are more readable in source code. Numeric entities (©) work for every Unicode character, including those without names. For the five essential characters and common symbols, use named entities for readability. For obscure characters, use numeric entities because they are universal.
Does HTML entity encoding affect SEO?
No. Search engines decode HTML entities before processing content. & and & are treated identically. The only SEO consideration is making sure your special characters render correctly for users, because garbled text increases bounce rate.
JSON Guide: Format, Validate, and Convert JSON Files
JSON guide for developers: syntax rules, common parse errors, formatting and schema validation, plus how to convert between JSON and CSV files.
Base64, URL Encoding & HTML Entities Explained
Encode and decode Base64, URLs, and HTML entities in your browser. Learn when to use each format, with clear examples and free converter tools.
Regular Expressions for Beginners: A Practical Guide
Learn regular expressions from scratch: basic syntax, character classes, quantifiers, and practical patterns for matching emails, URLs, and phone numbers.
