Character Encodings
This topic describes character encodings in various file formats.
- CP1252
- This encoding is also known as the Windows-1252 or simply Windows character set. It is a super set of ISO-8859-1 and uses the 128-159 code range to display additional characters not included in the ISO-8859-1 character set.
- UTF-8
- Supports all Unicode characters and is backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
- UTF-16
- Supports all Unicode characters but is not backwards-compatible with ASCII. For more information about UTF, see unicode.org/faq/utf_bom.html.
- US-ASCII
- A character encoding based on the order of the English alphabet.
- UTF-16BE
- UTF-16 encoding with big endian byte serialization (most significant byte first).
- UTF-16LE
- UTF-16 encoding with little endian byte serialization (least significant byte first).
- ISO-8859-1
- An ASCII character encoding typically used for Western European languages. Also known as Latin-1.
- ISO-8859-3
- An ASCII character encoding typically used for Southern European languages. Also known as Latin-3.
- ISO-8859-9
- An ASCII character encoding typically used for Turkish language. Also known as Latin-5.
- CP850
- An ASCII code page used to write Western European languages.
- CP500
- An EBCDIC code page used to write Western European languages.
- Shift_JIS
- A character encoding for the Japanese language.
- MS932
- A Microsoft's extension of Shift_JIS to include NEC special characters, NEC selection of IBM extensions, and IBM extensions.
- CP1047
- An EBCDIC code page with the full Latin-1 character set.