Character Encodings

This topic describes character encodings in various file formats.

CP1252
This encoding is also known as the Windows-1252 or simply Windows character set. It is a super set of ISO-8859-1 and uses the 128-159 code range to display additional characters not included in the ISO-8859-1 character set.
UTF-8
Supports all Unicode characters and is backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
UTF-16
Supports all Unicode characters but is not backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
US-ASCII
A character encoding based on the order of the English alphabet.
UTF-16BE
UTF-16 encoding with big endian byte serialization (most significant byte first).
UTF-16LE
UTF-16 encoding with little endian byte serialization (least significant byte first).
ISO-8859-1
An ASCII character encoding typically used for Western European languages. Also known as Latin-1.
ISO-8859-3
An ASCII character encoding typically used for Southern European languages. Also known as Latin-3.
ISO-8859-9
An ASCII character encoding typically used for Turkish language. Also known as Latin-5.
CP850
An ASCII code page used to write Western European languages.
CP500
An EBCDIC code page used to write Western European languages.
Shift_JIS
A character encoding for the Japanese language.
MS932
A Microsoft's extension of Shift_JIS to include NEC special characters, NEC selection of IBM extensions, and IBM extensions.
CP1047
An EBCDIC code page with the full Latin-1 character set.