Unicode is a universal character encoding standard. It defines the way individual characters are represented in text files, web pages, and other types of documents.
Unlike ASCII, which was designed to represent only basic English characters, Unicode was designed to support characters from all languages around the world. The standard ASCII character set only supports 128 characters, while Unicode can support roughly 1,000,000 characters. While ASCII only uses one byte to represent each character, Unicode supports up to 4 bytes for each character.
There are several different types of Unicode encodings, though UTF-8 and UTF-16 are the most common. UTF-8 has become the standard character encoding used on the Web and is also the default encoding used by many software programs. While UTF-8 supports up to four bytes per character, it would be inefficient to use four bytes to represent frequently used characters. Therefore, UTF-8 uses only one byte to represent common English characters. European (Latin), Hebrew, and Arabic characters are represented with two bytes, while three bytes are used to Chinese, Japanese, Korean, and other Asian characters. Additional Unicode characters can be represented with four bytes.