Indian Script Code for Information Interchange

Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Odia, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

ISCII has not been widely used outside certain government institutions, although a variant without the mechanism was used on classic Mac OS, Mac OS Devanagari,

Background

The Brahmi-derived writing systems have similar structure.

Special code points

; INV character—code point D9 (217): The INV (invisible consonant) character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क्‍ (half ka). The Unicode equivalent is (). However, as noted below, the ISCII halant character can be doubled or combined with the ISCII nukta to achieve effects created by or ZWJ in Unicode. For this reason, Apple maps the ISCII INV character to the Unicode , so as to guarantee round-tripping.

; ATR character—code point EF (239): The ATR (attribute) character followed by a byte code is used to switch to a different font attribute (such as bold) or to a different ISCII or PASCII language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.

{| class="wikitable collapsible"

|+ Presentational attributes

!ATR + byte!!Mnemonic!!Formatting option

|0x30||BLD||Bold

|0x31||ITA||Italics

|0x32||UL||Underlining

|0x33||EXP||Expanded

|0x34||HLT||Highlight

|0x35||OTL||Outline

|0x36||SHD||Shadow

|0x37||TOP||Top half of character (used with LOW to create double-height characters)

|0x38||LOW||Bottom half of character (used with TOP to create double-height characters)

|0x39||DBL||Entire row double-width and double-height

{| class="wikitable collapsible"

|+ Shifts to ISCII scripts

!ATR + byte!!Mnemonic!!ISCII script

|0x40||DEF||Default script (i.e. the script which will be switched back to after a line break)

|0x41||RMN||Romanised transliteration

|0x42||DEV||Devanagari

|0x43||BNG||Bengali script

|0x44||TML||Tamil script

|0x45||TLG||Telugu script

|0x46||ASM||Assamese script

|0x47||ORI||Odia script

|0x48||KND||Kannada script

|0x49||MLM||Malayalam script

|0x4A||GJR||Gujarati script

|0x4B||PNJ||Gurmukhī

{| class="wikitable collapsible"

|+ Shifts to PASCII

!ATR + byte!!Mnemonic!!PASCII locale

|0x71||ARB||Arabic alphabet

|0x72||PES||Persian alphabet

|0x73||URD||Urdu alphabet

|0x74||SND||Sindhi alphabet

|0x75||KSM||Kashmiri alphabet

|0x76||PST||Pashto alphabet

; EXT character—code point F0 (240): The EXT (extensions for Vedic) character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points.

; Halant character ्—code point E8 (232): The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्‌त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्‍त.

{| class="wikitable collapsible Unicode"

|+ Correspondences between ISCII and Unicode halent/virama behaviour

!colspan=2| ISCII !!colspan=2| Unicode

| single halant || <code>E8</code> || halant || <code>094D</code>

| halant + halant || <code>E8 E8</code> || halant + ZWNJ || <code>094D 200C</code>

| halant + nukta || <code>E8 E9</code> || halant + ZWJ || <code>094D 200D</code>

; Nukta character ़—code point E9 (233): The nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.

{| class="wikitable collapsible"

|+ Single Unicode characters corresponding to ISCII nukta sequences

! ISCII<br>code point !! Original<br>character !! Character<br>with nukta !! Unicode<br>code point

| A1 (161) || ँ || ॐ || 0950

| A6 (166) || इ || ऌ || 090C

| A7 (167) || ई || ॡ || 0961

| AA (176) || ऋ || ॠ || 0960

| B3 (179) || क || क़ || 0958

| B4 (180) || ख || ख़ || 0959

| B5 (181) || ग || ग़ || 095A

| BA (186) || ज || ज़ || 095B

| BF (191) || ड || ड़ || 095C

| C0 (192) || ढ || ढ़ || 095D

| C9 (201) || फ || फ़ || 095E

| DB (219) || ि || ॢ || 0962

| DC (220) || ी || ॣ || 0963

| DF (223) || ृ || ॄ || 0944

| EA (234) ||। || ऽ || 093D

Code pages for ISCII conversion

To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used:

57002: Devanagari (Hindi, Marathi, Sanskrit, Konkani)
57003: Bengali
57004: Tamil
57005: Telugu
57006: Assamese
57007: Odia
57008: Kannada
57009: Malayalam
57010: Gujarati
57011: Punjabi (Gurmukhi)

Code points for all languages

{| class="wikitable collapsible collapsed Unicode" border="1" style="text-align:center; font-size:100%;"

|+ Code set for all abugidas using ISCII