UTF
Stands for "Unicode Transformation Format." UTF refers to several types of Unicode character encodings, including UTF-7, UTF-8, UTF-16, and UTF-32.
- UTF-7 - uses 7 bits for each character. It was designed to represent ASCII characters in email messages that required Unicode encoding.
- UTF-8 - the most popular type of Unicode encoding. It uses one byte for standard English letters and symbols, two bytes for additional Latin and Middle Eastern characters, and three bytes for Asian characters. Additional characters can be represented using four bytes. UTF-8 is backwards compatible with ASCII, since the first 128 characters are mapped to the same values.
- UTF-16 - an extension of the "UCS-2" Unicode encoding, which uses two bytes to represent 65,536 characters. However, UTF-16 also supports four bytes for additional characters up to one million.
- UTF-32 - a multibyte encoding that represents each character with 4 bytes.
Most text in documents and webpages is encoded using one of the UTF encodings above. Many word processing programs do not allow you to view the character encoding of open documents, though some display the encoding on the bottom of the document window or within the file properties. If you want to see the type of character encoding used by a webpage, you can select to view the HTML of the page. The character encoding, if defined, will be in the header section, near the top of the HTML. A page that uses UTF-8 encoding may include one of the following text snippets below, depending on the version of the HTML.
XHTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
HTML 5: <meta charset="UTF-8">