In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some contexts these terms are used more precisely; see .)
The term "code page" originated from IBM's EBCDIC-based mainframe systems,
- 1 – USA WP, Original
- 2 – USA
- 3 – USA Accounting, Version A
- 4 – USA
- 5 – USA
- 6 – Latin America
- 7 – Germany F.R. / Austria
- 8 – Germany F.R.
- 9 – France, Belgium
- 10 – Canada (English)
- 11 – Canada (French)
- 12 – Italy
- 13 – Netherlands
- 14 – Spain
- 15 – Switzerland (French)
- 16 – Switzerland (French / German)
- 17 – Switzerland (German)
- 18 – Sweden / Finland
- 19 – Sweden / Finland WP, version 2
- 20 – Denmark/Norway
- 21 – Brazil
- 22 – Portugal
- 23 – United Kingdom
- 24 – United Kingdom
- 25 – Japan (Latin)
- 26 – Japan (Latin)
- 27 – Greece (Latin)
- 29 – Iceland
- 30 – Turkey
- 31 – South Africa
- 32 – Czechoslovakia (Czech / Slovak)
- 33 – Czechoslovakia
- 34 – Czechoslovakia
- 35 – Romania
- 36 – Romania
- 37 – USA/Canada - CECP (same with euro: 1140)
- 37-2 – The real 3279 APL codepage, as used by C/370. This is very close to 1047, except for caret and not-sign inverted. It is not officially recognized by IBM, even though SHARE has pointed out its existence.
- 1097 – Farsi Bilingual
- 1110 – Latin 2 (Revision of 870)
- 1112 – Baltic Multilingual (same with euro: 1156)
- 1113 – Latin 6
- 1122 – Estonia (same with euro: 1157)
- 1123 – Cyrillic, Ukraine (same with euro: 1158)
- 1130 – Vietnamese (same with euro: 1164)
- 1132 – Lao EBCDIC
- 1136 – Hitachi Katakana
- 1137 – Devanagari EBCDIC
- 1140 – USA, Canada, etc. ECECP (same without euro: 37) (Traditional Chinese version: 1159)
- 1141 – Austria, Germany ECECP (same without euro: 273)
- 1142 – Denmark, Norway ECECP (same without euro: 277)
- 1143 – Finland, Sweden ECECP (same without euro: 278)
- 1144 – Italy ECECP (same without euro: 280)
- 1145 – Spain, Latin America (Spanish) ECECP (same without euro: 284)
- 1146 – UK ECECP (same without euro: 285)
- 1147 – France ECECP with euro (same without euro: 297)
- 1148 – International ECECP with euro (same without euro: 500)
- 1149 – Icelandic ECECP with euro (same without euro: 871)
- 1150 – Korean Extended with box characters
- 1151 – Simplified Chinese Extended with box characters
- 1152 – Traditional Chinese Extended with box characters
- 1153 – Latin 2 Multilingual with euro (same without euro: 870)
- 1154 – Cyrillic, Multilingual with euro (same without euro: 1025; an older version is * 1166)
- 1155 – Turkey with euro (same without euro: 1026) (same with lira: 1175)
- 1156 – Baltic Multi with euro (same without euro: 1112)
- 1157 – Estonia with euro (same without euro: 1122)
- 1158 – Cyrillic, Ukraine with euro (same without euro: 1123)
- 1159 – T-Chinese EBCDIC (Traditional Chinese euro update of * 1140)
- 1160 – Thai with Low Marks & Accented Characters with euro (same without euro: 838)
- 1164 – Vietnamese with euro (same without euro: 1130)
- 1165 – Latin 2/Open Systems
- 1166 – Cyrillic Kazakh
- 1175 – Turkey with euro and lira (same without lira: 1155)
- 1278 – EBCDIC Adobe (PostScript) Standard Encoding
- 1279 – Hitachi Japanese Katakana Host
- 1201 – UTF-16BE Unicode (big-endian) Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.
- 42 – Windows Symbol
- 874 – Windows Thai
- 1250 – Windows Central Europe
- 1251 – Windows Cyrillic
- 1252 – Windows Western
- 1253 – Windows Greek
- 1254 – Windows Turkish
- 1255 – Windows Hebrew
- 1256 – Windows Arabic
- 1257 – Windows Baltic
- 1258 – Windows Vietnamese
Microsoft recommends new applications use UTF-8 or UCS-2/UTF-16 instead of these code pages. Although browsers were typically programmed to deal with this behaviour, this was not always true of other software. Consequently, when receiving a file transfer from a Windows system, non-Windows platforms would either ignore these characters or treat them as a standard control characters and attempt to take the specified control action accordingly.
Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, the problems listed above are rarely a concern for Unicode. UTF-8 (which can encode over one million codepoints) has replaced the code-page method in terms of popularity on the Internet.
External links
- IBM CDRA glossary
- IBM/ICU Charset Information
- Microsoft Code Page Identifiers (Microsoft's list contains only code pages actively used by normal apps on Windows. See also Torsten Mohrin's list for the full list of supported code pages)
- Character Sets And Code Pages At The Push Of A Button
- Microsoft Chcp command: Display and set the console active code page
