UTF

Stands for "Unicode Transformation Format." UTF refers to several types of Unicode character encodings, including UTF-7, UTF-8, UTF-16, and UTF-32.

UTF-7 - uses 7 bits for each character. It was designed to represent ASCII characters in email messages that required Unicode encoding.
UTF-8 - the most popular type of Unicode encoding. It uses one byte for standard English letters and symbols, two bytes for additional Latin and Middle Eastern characters, and three bytes for Asian characters. Additional characters can be represented using four bytes. UTF-8 is backwards compatible with ASCII, since the first 128 characters are mapped to the same values.
UTF-16 - an extension of the "UCS-2" Unicode encoding, which uses two bytes to represent 65,536 characters. However, UTF-16 also supports four bytes for additional characters up to one million.
UTF-32 - a multibyte encoding that represents each character with 4 bytes.

Most text in documents and webpages is encoded using one of the UTF encodings above. Many word processing programs do not allow you to view the character encoding of open documents, though some display the encoding on the bottom of the document window or within the file properties. If you want to see the type of character encoding used by a webpage, you can select View → View Source to view the HTML of the page. The character encoding, if defined, will be in the header section, near the top of the HTML. A page that uses UTF-8 encoding may include one of the following text snippets below, depending on the version of the HTML.

XHTML: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
HTML 5: <meta charset="UTF-8">

Updated April 20, 2012 by Per C.

UTF

Test Your Knowledge

Tech Factor

The Tech Terms Computer Dictionary

The Tech Terms Newsletter

Sign up for the free TechTerms Newsletter

Thank You