Code page - WikiHQ

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some contexts these terms are used more precisely; see .)

The term "code page" originated from IBM's EBCDIC-based mainframe systems,

1 – USA WP, Original
2 – USA
3 – USA Accounting, Version A
4 – USA
5 – USA
6 – Latin America
7 – Germany F.R. / Austria
8 – Germany F.R.
9 – France, Belgium
10 – Canada (English)
11 – Canada (French)
12 – Italy
13 – Netherlands
14 – Spain
15 – Switzerland (French)
16 – Switzerland (French / German)
17 – Switzerland (German)
18 – Sweden / Finland
19 – Sweden / Finland WP, version 2
20 – Denmark/Norway
21 – Brazil
22 – Portugal
23 – United Kingdom
24 – United Kingdom
25 – Japan (Latin)
26 – Japan (Latin)
27 – Greece (Latin)
29 – Iceland
30 – Turkey
31 – South Africa
32 – Czechoslovakia (Czech / Slovak)
33 – Czechoslovakia
34 – Czechoslovakia
35 – Romania
36 – Romania
37 – USA/Canada - CECP (same with euro: 1140)
37-2 – The real 3279 APL codepage, as used by C/370. This is very close to 1047, except for caret and not-sign inverted. It is not officially recognized by IBM, even though SHARE has pointed out its existence.
1097 – Farsi Bilingual
1110 – Latin 2 (Revision of 870)
1112 – Baltic Multilingual (same with euro: 1156)
1113 – Latin 6
1122 – Estonia (same with euro: 1157)
1123 – Cyrillic, Ukraine (same with euro: 1158)
1130 – Vietnamese (same with euro: 1164)
1132 – Lao EBCDIC
1136 – Hitachi Katakana
1137 – Devanagari EBCDIC
1140 – USA, Canada, etc. ECECP (same without euro: 37) (Traditional Chinese version: 1159)
1141 – Austria, Germany ECECP (same without euro: 273)
1142 – Denmark, Norway ECECP (same without euro: 277)
1143 – Finland, Sweden ECECP (same without euro: 278)
1144 – Italy ECECP (same without euro: 280)
1145 – Spain, Latin America (Spanish) ECECP (same without euro: 284)
1146 – UK ECECP (same without euro: 285)
1147 – France ECECP with euro (same without euro: 297)
1148 – International ECECP with euro (same without euro: 500)
1149 – Icelandic ECECP with euro (same without euro: 871)
1150 – Korean Extended with box characters
1151 – Simplified Chinese Extended with box characters
1152 – Traditional Chinese Extended with box characters
1153 – Latin 2 Multilingual with euro (same without euro: 870)
1154 – Cyrillic, Multilingual with euro (same without euro: 1025; an older version is * 1166)
1155 – Turkey with euro (same without euro: 1026) (same with lira: 1175)
1156 – Baltic Multi with euro (same without euro: 1112)
1157 – Estonia with euro (same without euro: 1122)
1158 – Cyrillic, Ukraine with euro (same without euro: 1123)
1159 – T-Chinese EBCDIC (Traditional Chinese euro update of * 1140)
1160 – Thai with Low Marks & Accented Characters with euro (same without euro: 838)
1164 – Vietnamese with euro (same without euro: 1130)
1165 – Latin 2/Open Systems
1166 – Cyrillic Kazakh
1175 – Turkey with euro and lira (same without lira: 1155)
1278 – EBCDIC Adobe (PostScript) Standard Encoding
1279 – Hitachi Japanese Katakana Host
1201 – UTF-16BE Unicode (big-endian) Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.

42 – Windows Symbol
874 – Windows Thai
1250 – Windows Central Europe
1251 – Windows Cyrillic
1252 – Windows Western
1253 – Windows Greek
1254 – Windows Turkish
1255 – Windows Hebrew
1256 – Windows Arabic
1257 – Windows Baltic
1258 – Windows Vietnamese

Microsoft recommends new applications use UTF-8 or UCS-2/UTF-16 instead of these code pages. Although browsers were typically programmed to deal with this behaviour, this was not always true of other software. Consequently, when receiving a file transfer from a Windows system, non-Windows platforms would either ignore these characters or treat them as a standard control characters and attempt to take the specified control action accordingly.

Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, the problems listed above are rarely a concern for Unicode. UTF-8 (which can encode over one million codepoints) has replaced the code-page method in terms of popularity on the Internet.

External links

IBM CDRA glossary
IBM/ICU Charset Information
Microsoft Code Page Identifiers (Microsoft's list contains only code pages actively used by normal apps on Windows. See also Torsten Mohrin's list for the full list of supported code pages)
Character Sets And Code Pages At The Push Of A Button
Microsoft Chcp command: Display and set the console active code page