Character Encodings

This topic describes character encodings in various file formats.

CP1252: This encoding is also known as the Windows-1252 or simply Windows character set. It is a super set of ISO-8859-1 and uses the 128-159 code range to display additional characters not included in the ISO-8859-1 character set.
UTF-8: Supports all Unicode characters and is backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
UTF-16: Supports all Unicode characters but is not backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
US-ASCII: A character encoding based on the order of the English alphabet.
UTF-16BE: UTF-16 encoding with big endian byte serialization (most significant byte first).
UTF-16LE: UTF-16 encoding with little endian byte serialization (least significant byte first).
ISO-8859-1: An ASCII character encoding typically used for Western European languages. Also known as Latin-1.
ISO-8859-3: An ASCII character encoding typically used for Southern European languages. Also known as Latin-3.
ISO-8859-9: An ASCII character encoding typically used for Turkish language. Also known as Latin-5.
CP850: An ASCII code page used to write Western European languages.
CP500: An EBCDIC code page used to write Western European languages.
Shift_JIS: A character encoding for the Japanese language.
MS932: A Microsoft's extension of Shift_JIS to include NEC special characters, NEC selection of IBM extensions, and IBM extensions.
CP1047: An EBCDIC code page with the full Latin-1 character set.