C0 and C1 control codes

The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received.

C0 codes are the range 00HEX–1FHEX and the default C0 set was originally defined in ISO 646 (ASCII). C1 codes are the range 80HEX–9FHEX and the default C1 set was originally defined in ECMA-48 (harmonized later with ISO 6429). The ISO/IEC 2022 system of specifying control and graphic characters allows other C0 and C1 sets to be available for specialized applications, but they are rarely used.

C0 controls

ASCII defines 32 control characters, plus the DEL character. This large number of codes was desirable at the time, as multi-byte controls would require implementation of a state machine in the terminal, which was very difficult with contemporary electronics and mechanical terminals.

Only a few codes have maintained their use: BEL, ESC, and the format effector (FEn) characters BS, HT, LF, VT, FF, and CR. Others are unused or have acquired different meanings such as NUL being the C string terminator. Some data transfer protocols such as ANPA-1312, Kermit, and XMODEM do make extensive use of SOH, STX, ETX, EOT, ACK, NAK and SYN for purposes approximating their original definitions; and some file formats use the "Information Separators" (ISn) such as the Unix info format and Python's string method.

The names of some codes were changed in ISO 6429:1992 (or ECMA-48:1991) to be neutral with respect to writing direction. The abbreviations used were not changed, as the standard had already specified that those would remain unchanged when the standard is translated to other languages. In this table both new and old names are shown for the renamed controls (the old name is the one matching the abbreviation).

Unicode provides Control Pictures that can replace C0 control characters to make them visible on screen. However caret notation is used more often.

{| class="wikitable" id="ASCII"

|+ ASCII control codes, originally defined in ANSI X3.4.

! Abbreviations

! Name

! Description

|- id="NUL"

|||0||00||NUL||␀||Null ||

| Does nothing. The code of blank paper tape, and also used for padding to slow transmission.

|- id="SOH"

|||1||01||TC1, SOH||␁||Start of Heading ||

| First character of the heading of a message.

|- id="STX"

|||2||02||TC2, STX||␂||Start of Text ||

| Terminates the header and starts the message text.

|- id="ETX"

|||3||03||TC3, ETX||␃||End of Text ||

| Ends the message text, starts a footer (up to the next TC character). May place terminals on standby. e.g. in Perl. and the corresponding control picture code point is called SYMBOL FOR BELL. Perl subsequently switched to using BELL for U+1F514 in version 5.18.||␇||Bell, Alert||

|Call for attention from an operator.

|- id="BS"

|||8||08||FE0, BS||␈||Backspace||

|Move one position leftwards. Next character may overprint or replace the character that was there.

|- id="HT"

|||9||09||FE1, HT, TAB||␉||Character Tabulation, Horizontal Tabulation||

|Move right to the next tab stop.

|- id="LF"

|||10||0A||FE2, LF||␊||Line Feed||

|Move down to the same position on the next line (some devices also moved to the left column).

|- id="VT"

|||11||0B||FE3, VT||␋||Line Tabulation, Vertical Tabulation||

|Move down to the next vertical tab stop.

|- id="FF"

|||12||0C||FE4, FF, NP||␌||Form Feed||

|Move down to the top of the next page.

|- id="CR"

|||13||0D||FE5, CR||␍||Carriage Return||

|Move to column zero while staying on the same line.

|- id="SO"

|||14||0E||SO, LS1||␎||Shift Out||

|Switch to an alternative character set.

|- id="SI"

|||15||0F||SI, LS0 DLE||␐||Data Link Escape||

| Cause a limited number of contiguously following characters to be interpreted in some different way.

|- id="DC1"

|||17||11||DC1, XON||␑||Device Control One||

| rowspan="4" |Turn on (DC1 and DC2) or off (DC3 and DC4) devices.

Teletype used these for the paper tape reader and the paper tape punch. The first use became the de facto standard for software flow control.

|- id="DC2"

|||18||12||DC2, TAPE||␒||Device Control Two||

|- id="DC3"

|||19||13||DC3, XOFF||␓||Device Control Three||

|- id="DC4"

|||20||14||DC4, <s>TAPE</s>||␔||Device Control Four||

|- id="NAK"

|||21||15||TC8, NAK||␕||Negative Acknowledge ||

| Negative response to a sender, such as a detected error.

|- id="SYN"

|||22||16||TC9, SYN||␖||Synchronous Idle ||

| Sent in synchronous transmission systems when no other character is being transmitted.

|- id="ETB"

|||23||17||TC10, ETB||␗||||

| End of a transmission block of data when data are divided into such blocks for transmission purposes.

|- id="CAN"

|||24||18||CAN||␘||Cancel ||

| Indicates that the data preceding it are in error or are to be disregarded.

|- id="EM"

|||25||19||EM ||␙||End of medium||

| Indicates on paper or magnetic tapes that the end of the usable portion of the tape had been reached. attempted to define a method so an 8-bit "extended ASCII" code could be converted to a corresponding 7-bit code, and vice versa. (i.e. all but the C0 control codes), to be the characters that an 8-bit environment would print if it used the same code with the high bit set. This meant that the range through could not be printed in a 7-bit environment, thus it was decided that no alternative character set could use them, and that these codes should be additional control codes, which become known as the C1 control codes. To allow a 7-bit environment to use these new controls, the sequences <code>ESC @</code> through <code>ESC _</code> were to be considered equivalent. and JIS X 0211 (formerly JIS C 6323). Symbolic names defined by and early drafts of ISO 10646, but not in ISO/IEC 6429 (, and ) are also used.

|- id="PAD"

|@||128||80||PAD

|Proposed as a "padding" or "high byte" for single-byte characters to make them two bytes long for easier interoperability with multiple byte characters. Extended Unix Code (EUC) occasionally uses this.

|- id="HOP"

|A||129||81||HOP and 1991 respectively for ECMA-48).||Move down one line without moving horizontally, to eliminate ambiguity about the meaning of LF.

|- id="NEL"

|E||133||85||NEL||Next Line||Equivalent to CR+LF, to match the EBCDIC control character.

|- id="SSA"

|F||134||86||SSA||Start of Selected Area||rowspan=2|Used by block-oriented terminals. In xterm moves to the lower-left corner of the screen, since certain software assumes this behaviour.

|- id="ESA"

|G||135||87||ESA||End of Selected Area

|- id="HTS"

|H||136||88||HTS||||Set a tab stop at the current position.

|- id="HTJ"

|I||137||89||HTJ||

|Right-justify the text since the last tab against the next tab stop.

|- id="VTS"

|J||138||8A||VTS||||Set a vertical tab stop.

|- id="PLD"

|K||139||8B||PLD||||rowspan="2"| To produce subscripts and superscripts in ISO/IEC 6429. Subscripts use <code>PLD text PLU</code> while superscripts use <code>PLU text PLD</code>.

|- id="PLU"

|L||140||8C||PLU||

|- id="RI"

|M||141||8D||RI||||Move up one line.

|- id="SS2"

|N||142||8E||SS2||||rowspan="2"|Next character is from the G2 or G3 sets, respectively.

|- id="SS3"

|O||143||8F||SS3||

|- id="DCS"

|P||144||90||DCS||Device Control String||Followed by a string of printable characters (0x20 through 0x7E) and format effectors (0x08 through 0x0D), terminated by ST (0x9C). Xterm defined a number of these.

|- id="PU1"

|Q||145||91||PU1||Private Use 1||rowspan=2|Reserved for private function agreed on between the sender and the recipient of the data.

|- id="PU2"

|R||146||92||PU2||Private Use 2

|- id="STS"

|S||147||93||STS||Set Transmit State||

|- id="CCH"

|T||148||94||CCH||Cancel character||Destructive backspace, to eliminate ambiguity about meaning of .

|- id="MW"

|U||149||95||MW||Message Waiting||

|- id="SPA"

|V||150||96||SPA||Start of Protected Area||rowspan=2|Used by block-oriented terminals.

|- id="EPA"

|W||151||97||EPA||End of Protected Area

|- id="SOS"

|X||152||98||SOS||Start of String||Followed by a control string terminated by (0x9C) which (unlike , , or ) may contain any character except SOS or ST.

|- id="SGC"

|Y||153||99||SGC, SGCI||Single Graphic Character Introducer

|Intended to allow an arbitrary Unicode character to be printed; it would be followed by 4 bytes to define a 32-bit code point, most likely big-endian. Kermit used APC to transmit commands. Kitty uses APC to render graphics using the Kitty Graphics Protocol.

|- id="PM"

|^||158||9E||PM||Privacy Message

|- id="APC"

|_||159||9F||APC||Application Program Command

Other control code sets

The ISO/IEC 2022 (ECMA-35) extension mechanism allowed escape sequences to change the C0 and C1 sets. The standard C0 control character set shown above is chosen with the sequence and the above C1 set chosen with the sequence . SP and DEL "fixed" coded characters, which are available in their ASCII locations in all encodings that conform to the standard. It also specifies that if a C0 set included transmission control (TCn) codes, they must be encoded at their ASCII locations and could not be put in a C1 set, and any new transmission controls must be in a C1 set. and others replaced EM and GS with SS2 and SS3 so these functions could be used in a 7-bit environment without resorting to escape sequences.

Some sets replaced FS with SS2, (same as ANPA-1312).
The now-withdrawn JIS C 6225, designated JIS X 0207 in later sources. replaced FS with CEX or "Control Extension" which introduces control sequences for vertical text behaviour, superscripts and subscripts and for transmitting custom character graphics.
Various specialised C1 control code sets are registered for use by Videotex formats. It includes SS1 (Single-Shift 1) through SS15 (Single-Shift 15) controls, used to invoke individual characters from pre-defined supplementary character sets, in a similar manner to the single-shift mechanism of ISO/IEC 2022. The only single-shift controls defined by ISO/IEC 2022 are SS2 and SS3; these are retained in the VOS set at their original code points and function the same way.
EBCDIC defines up to 29 additional control codes besides those present in ASCII. When translating EBCDIC to Unicode (or to ISO 8859), these codes are mapped to C1 control characters in a manner specified by IBM's Character Data Representation Architecture (CDRA). Although the New Line (NL) does translate to the ISO/IEC 6429 (although it is often swapped with LF, following UNIX line ending convention),

Unicode

Unicode reserves the 65 code points described above for compatibility with the C0 and C1 control codes, giving them the general category (control). These are:

(C0 controls) and (DEL) assigned to the C0 Controls and Basic Latin block, and
(C1 controls) assigned to the C1 Controls and Latin-1 Supplement block.

Unicode only specifies semantics for the C0 format controls HT, LF, VT, FF, and CR (note BS is missing); the C0 information separators FS, GS, RS, US (and SP); and the C1 control NEL.

Unicode includes many additional format effector characters besides these, such as marks, embeds, isolates and pops for explicit bidirectional formatting, and the zero-width joiner and non-joiner for controlling ligature use. However these are given the general category (format) rather than .

Footnotes

References

External links

The Unicode Standard
C0 Controls and Basic Latin
C1 Controls and Latin-1 Supplement
Control Pictures
The Unicode Standard, Version 6.1.0, Chapter 16: Special Areas and Format Characters
ATIS Telecom Glossary 2007
De litteris regentibus C1 quaestiones septem or Are C1 characters legal in XHTML 1.0?
W3C I18N FAQ: HTML, XHTML, XML and Control Codes
International register of coded character sets to be used with escape sequences

de:Steuerzeichen