thumb|500px|Early written Arabic used only [[rasm (in black). Later, i‘jām (in red) were added so that letters such as ṣād () and ḍād () could be distinguished. Ḥarakāt (in blue)—which is used in the Qur'an but not in most written Arabic—indicate short vowels, long consonants, and some other vocalizations.]]

The Arabic script has numerous diacritics, which include consonant pointing known as (, ), and supplementary diacritics known as (, ). The latter include the vowel marks termed (, ; , ', ).

The Arabic script is a modified abjad, where all letters are consonants, leaving it up to the reader to fill in the vowel sounds. Short consonants and long vowels are represented by letters, but short vowels and consonant length are not generally indicated in writing. ' is optional to represent missing vowels and consonant length. Modern Arabic is always written with the i‘jām—consonant pointing—but only religious texts, children's books and works for learners are written with the full tashkīl—vowel guides and consonant length. It is, however, not uncommon for authors to add diacritics to a word or letter when the grammatical case or the meaning is deemed otherwise ambiguous. In addition, classical works and historical documents rendered to the general public are often rendered with the full tashkīl, to compensate for the gap in understanding resulting from stylistic changes over the centuries.

Moreover, tashkīl can change the meaning of the entire word, for example, the words: (دِين), meaning (religion), and (دَين), meaning (debt). Even though they have the same letters, their meanings are different because of the tashkīl. In sentences without tashkīl, readers understand the meaning of the word by simply using context.

Tashkīl

The literal meaning of ' is 'formation'. As the normal Arabic text does not provide enough information about the correct pronunciation, the main purpose of ' (and ') is to provide a phonetic guide or a phonetic aid; i.e. show the correct pronunciation for children who are learning to read or foreign learners.

The bulk of Arabic script is written without ' (or short vowels). However, they are commonly used in texts that demand strict adherence to exact pronunciation. This is true, primarily, of the Qur'an (') and poetry. It is also quite common to add ' to hadiths ('; plural: ') and the Bible. Another use is in children's literature. Moreover, ' are used in ordinary texts in individual words when an ambiguity of pronunciation cannot easily be resolved from context alone. Arabic dictionaries with vowel marks provide information about the correct pronunciation to both native and foreign Arabic speakers. In art and calligraphy, ' might be used simply because their writing is considered aesthetically pleasing.

An example of a fully vocalised (vowelised or vowelled) Arabic from the Bismillah:

Some Arabic textbooks for foreigners now use ' as a phonetic guide to make learning reading Arabic easier. The other method used in textbooks is phonetic romanisation of unvocalised texts. Fully vocalised Arabic texts (i.e. Arabic texts with '/diacritics) are sought after by learners of Arabic. Some online bilingual dictionaries also provide ' as a phonetic guide similarly to English dictionaries providing transcription.

Ḥarakāt (short vowel marks)<span class="anchor" id="Ḥarakāt"></span>

The ' , which literally means 'motions', are the short vowel marks. There is some ambiguity as to which ' are also '; the ', for example, are markers for both vowels and consonants.

Fatḥah

The ( 'opening') is a small diagonal line placed above a letter, and represents a short (like the sound in the English word "cat"). The word ' itself () means opening and refers to the opening of the mouth when producing an . For example, with Dalet| (henceforth, the base consonant in the following examples): .

When a is placed before a plain letter (Aleph|) (i.e. one having no hamza or vowel of its own), it represents a long (close to the sound of "a" in the English word "dad", with an open front vowel , not back as in "father"). For example: . The ' is not usually written in such cases. When a fathah is placed before the letter ⟨⟩ (yā’), it creates an (as in "lie"); and when placed before the letter ⟨⟩ (wāw), it creates an (as in "cow").

Although paired with a plain letter creates an open front vowel , often realized as near-open , the standard also allows for variations, especially under certain surrounding conditions. Usually, in order to have the back pronunciation , the word features a nearby back consonant, such as the emphatics, as well as qāf, or rā’. A similar "back" quality is undergone by other vowels as well in the presence of such consonants, however not as drastically realized as in the case of .

's are encoded

,

,

, or

.

Kasrah

A similar diagonal line below a letter is called a ( 'break') and designates a short (as in "me", "be") and its allophones [i, ɪ, e, e̞, ɛ] (as in "Tim", "sit"). For example: .

When a ' is placed before a plain letter (ي|), it represents a long (as in the English word "steed"). For example: . The ' is usually not written in such cases, but if ي| is pronounced as a diphthong , ' should be written on the preceding letter to avoid mispronunciation. The word ' means 'breaking'.

Tanwīn

The three vowel diacritics may be doubled at the end of a word to indicate that the vowel is followed by the consonant n. They may or may not be considered and are known as ( nun-ification' , or nunation. The signs indicate, from left to right, .

These endings are used as non-pausal grammatical indefinite case endings in Literary Arabic or classical Arabic (triptotes only). In a vocalised text, they may be written even if they are not pronounced (see pausa). See ʾIʿrab| for more details. In many spoken Arabic dialects, the endings are absent. Many Arabic textbooks introduce standard Arabic without these endings. The grammatical endings may not be written in some vocalized Arabic texts, as knowledge of ʾIʿrab| varies from country to country, and there is a trend towards simplifying Arabic grammar.

The sign is most commonly written in combination with aleph| , Ta' marbuta| , ' , or stand-alone ' . ' should always be written (except for words ending in ' or diptotes) even if ' is not. Grammatical cases and ' endings in indefinite triptote forms:

  • ': nominative case;
  • ': accusative case, also serves as an adverbial marker;
  • ': genitive case.

Dammatan has another shape, , which better resembles how the symbol is handwritten, printed in school books, and in the Quran.

Shaddah

The shadda or shaddah ( 'emphasis'), or tashdid ('), is a diacritic shaped like a small written Latin "w".

It is used to indicate gemination (consonant doubling or extra length), which is phonemic in Arabic. It is written above the consonant which is to be doubled. It is the only ' that is commonly used in ordinary spelling to avoid ambiguity. For example: ; ' ('school') vs. ' ('teacher', female). Note that when the doubled letter bears a vowel, it is the shaddah that the vowel is attached to, not the letter itself: , .

's are encoded ,

, or

.

I‘jām

thumb|right|7th-century [[kufic script without any ' or '.]]

The i‘jām (; sometimes also called ') are the diacritic points that distinguish various letters that have the same backbone form ('). In modern Arabic scripts, typically one, two or three dots above or below the letter distinguish it as a different letter. For example, is the letter ', whereas with a dot above is the letter . Typically i‘jām are not considered diacritics but part of the letter.

Early manuscripts of the Quran did not use diacritics either for vowels or to distinguish the different values of the . Vowel pointing was introduced first, as a red dot placed above, below, or beside the ', and later consonant pointing was introduced, as thin, short black single or multiple dashes placed above or below the rasm. These i‘jām became black dots about the same time as the ' became small black letters or strokes.

Typically, Egyptians do not use dots under final ' (), which looks exactly like alif maqsurah () in handwriting and in print. This practice is also used in copies of the Mus'haf| (Qurʾān) scribed by Uthman Taha|. The same unification of ' and ' has happened in Persian, resulting in what the Unicode Standard calls "", that looks exactly the same as ' in initial and medial forms, but exactly the same as ' in final and isolated forms.

thumb|Isolated kāf with ‘alāmātu-l-ihmāl and without top stroke next to initial kāf with top stroke.

At the time when the i‘jām was optional, unpointed letters were ambiguous. To clarify that a letter would lack i‘jām in pointed text, the letter could be marked with a small v- or seagull-shaped diacritic above, also a superscript semicircle (crescent), a subscript dot (except in the case of ; three dots were used with ), or a subscript miniature of the letter itself. A superscript stroke known as jarrah, resembling a long fatḥah, was used for a contracted (assimilated) sīn. Thus were all used to indicate that the letter in question was truly and not . These signs, collectively known as ‘alāmātu-l-ihmāl, are still occasionally used in modern Arabic calligraphy, either for their original purpose (i.e. marking letters without i‘jām), or often as purely decorative space-fillers. The small above the kāf in its final and isolated forms was originally an ‘alāmatu-l-ihmāl that became a permanent part of the letter. Previously this sign could also appear above the medial form of kāf, when that letter was written without the stroke on its ascender. When kāf was written without that stroke, it could be mistaken for lām, thus kāf was distinguished with a superscript kāf or a small superscript hamza (nabrah), and lām with a superscript l-a-m (lām-alif-mīm).

Hamza

Although not always considered a letter of the alphabet, the hamza (', glottal stop), often stands as a separate letter in writing, is written in unpointed texts and is not considered a . It may appear as a letter by itself or as a diacritic over or under an ', ', or '.

Which letter is to be used to support the ' depends on the quality of the adjacent vowels and its location in the word;

  • If the glottal stop occurs at the beginning of the word:
  • Indicated by hamza on an ': above if the following vowel is or and below if it is .
  • In order to clarify a starting /a/ or /u/, a respective fatḥah or ḍammah can be used
  • If the glottal stop occurs in the middle of the word the following prioritization of writing qualities are used:
  • First: if hamza is it is preceded or followed by , hamza sits on a tooth; ex: <عَائِلَة>
  • Second: if hamza is preceded or followed by /u/, hamza sits on , <ؤ>
  • Third: else hamza sits on alif, <أ>
  • If the glottal stop occurs at the end of the word (ignoring any grammatical suffixes),
  • First: if hamza follows a short vowel it is written above ', ', or ' the same as for a medial case;
  • Second: if it follows a long vowel, diphthong or consonant, hamza is written on the line <ء>
  • Exception: Two 's in succession are never allowed: is written with #Maddah| and is written with a free ' on the line .

Consider the following words: ("brother"), ("Ismael"), ("mother"). All three of above words "begin" with a vowel opening the syllable, and in each case, ' is used to designate the initial glottal stop (the actual beginning). But if we consider middle syllables "beginning" with a vowel: ("origin"), ("hearts"—notice the syllable; singular ), ("heads", singular ), the situation is different, as noted above. See the comprehensive article on hamzah for more details.

Diacritics not used in Modern Standard Arabic

Diacritics not used in Modern Standard Arabic but in other languages that use the Arabic script, and sometimes to write Arabic dialects, include (the list is not exhaustive):

{|class=wikitable

|-

! Description

! Unicode

! Example

! Language(s)

! Notes

|-

! colspan=5 style="text-align:left;" | Bars and lines

|-

| diagonal bar above

|

| style="font-size:36px;line-height:1.05;" | گ

| Arabic (Iraq), Balti, Burushaski,<br/>Kashmiri, Kazakh,<br/>Khowar, Kurdish,<br/>Kyrgyz, Persian,<br/>Sindhi, Urdu,<br/>Uyghur

|

  • Diagonal bar above kaf to create gaf: گ
  • When writing Arabic, often used in Iraq to represent the sound . Often used in Iraq to represent the /g/ sound to write foreign words in Arabic script, while in Morocco the variant ݣ is seen.

|-

| horizontal bar above

|

| style="font-size:36px;line-height:1.05;" |

| Pashto

|

  • zwarakay, equivalent to Latin ə, IPA

|-

| vertical line above

|

| style="font-size:36px;line-height:1.05;" |ئۈ

| Uyghur

|

  • the letter ئۈ (IPA ) contains a vertical line above the vav

|-

! colspan=5 style="text-align:left;" | Dots

|-

| 2 dots (vertical)

|

| style="font-size:36px;line-height:1.05;" |

|Uyghur

|

|-

| 4 dots

|

| style="font-size:24px;line-height:1.05;" | ٿ ڐ ڙ

| Sindhi, Shina, Khariboli

|

|-

| dot below

|

| style="font-size:36px;line-height:1.4;" | ٜ&nbsp;&nbsp;&nbsp;بٜ

| African languages

|-

! colspan=5 style="text-align:left;" | Variants of standard Arabic diacritics

|-

| wavy hamza

|

|style="font-size:36px;line-height:1.4;" | ٲ اٟ

|Kashmiri

|

  • The Kashmiri language written in Arabic script includes the diacritic or "wavy hamza".
  • In Kashmiri the diacritic is called amālü mad when used above alif: ٲ to create the vowel .
  • Kashmiri calls the wavy hamza as sāyi mad when below the alif: اٟ to create the sound .

|-

| curly dammah above

|

| style="font-size:36px;line-height:1.05;" |

| Rohingya

|

  • Latin "ou"

|-

|

|

|

| Rohingya

|

  • Latin "oñ"

|-

| double dammah above

|

| style="font-size:36px;line-height:1.05;" |

| Rohingya

|

  • Latin "uñ"

|-

| inverted and regular curly dammahs above

|

| style="font-size:36px;line-height:1.05;" |

| Rohingya

|

  • Latin "ouñ"

|-

! colspan=5 style="text-align:left;" | Tildes

|-

| diagonal tilde shape above

|

| style="font-size:36px;line-height:1.05;" |

| Rohingya

|

  • Latin "o"

|-

| diagonal tilde shape below

|

|style="font-size:36px;line-height:1.05;" |

| Rohingya

|

  • Latin "e"

|-

! colspan=5 style="text-align:left;" | Arabic letters

|-

| miniature Arabic letter hah (initial form) ﺣ above

|

| style="font-size:36px;line-height:1.05;" |

| Rohingya

|

  • Sukun (zero-vowel)

|-

| miniature Arabic letter tah ط above

|

| style="font-size:36px;line-height:1.05;" |

| Urdu

|

|-

| colspan=5 style="text-align:left;" | Eastern Arabic numerals

|-

| Eastern Arabic numeral 2: ٢ above

|

| style="font-size:36px;line-height:1.05;" | &nbsp;&nbsp;

| Burushaski

|

  • Present in the Burushaski letters and

|-

| Eastern Arabic numeral 3: ٣ above

|

| style="font-size:36px;line-height:1.05;" | &nbsp;&nbsp;

| Burushaski

|

  • Present in the Burushaski letters , and

|-

| Urdu number 4: ۴ above or below

|

| style="font-size:36px;line-height:1.05;" | &nbsp;&nbsp;

| Burushaski

|

  • Present in the Burushaski letters and

|-

! colspan=5 style="text-align:left;" | Other shapes

|-

| Nūn ġuṇnā, "u" shape above

|

| style="font-size:36px;line-height:1.05;" |ن٘

| Urdu

|

  • Vowel nasalization is represented by nun ghunna, which in medial form is written as nun with the diacritic (also called ulta jazm, Unicode U+0658) above: .

|-

| "v" shape above

|

| style="font-size:36px;line-height:1.05;" |ۆ&nbsp;ێ&nbsp;ئۆ

| Azerbaijani, Turkmen, Kurdish, Kazakh, Uyghur، Bosnian (Arebica)

|

  • used on top of waw: ۆ to represent "o" in Kurdish, and "ü" in Azerbaijani and Turkmen
  • used on top of ye: ێ represents "ê" in Kurdish.
  • used on top of waw: ۆ to represent "v" in Kazakh.
  • In Uyghur it used as part of the letter digraph ئۆ to represent "ö" .

|-

| inverted "v" shape above

|

| style="font-size:36px;line-height:1.05;" |یٛ

|Azerbaijani, Turkmen, Bosnian (Arebica)

|

  • in Azerbaijani, used only on top of ye: یٛ (rarely used) is equivalent to Latin ı, Cyrillic ы, IPA
  • in Turkmen, used only on top of ye: یٛ is equivalent to Latin y, Cyrillic ы, IPA

|-

| dotted fatha

|

| <span style="font-size:36px;line-height:1.15;"></span>

| Wolof

| Latin à

|-

| circle with fatha

|

| <span style="font-size:36px;line-height:1.15;"></span>

| Wolof

| Latin ë

|-

| less than sign - below

|

| <span style="font-size:36px;line-height:1.15;"></span>

| Wolof

| Latin e

|-

| greater than sign - below

|

| <span style="font-size:36px;line-height:1.15;"></span>

| Wolof

| Latin é

|-

| less than sign - above

|

| <span style="font-size:36px;line-height:1.15;"></span>

| Wolof

| Latin o

|-

| greater than sign - above

|

| <span style="font-size:36px;line-height:1.15;"></span>

| Wolof

| Latin ó

|-

| ring

|

| style="font-size:36px;line-height:1.15;" | ګ

| Pashto

|

  • kaf with ring (ګ) is used for IPA

|-

! colspan=5 style="text-align:left;" | Other shapes

|-

| "fish" shape above

|

| style="font-size:36px;line-height:1.4;" |دࣤ࣬&nbsp;&nbsp;دࣥ࣬&nbsp;&nbsp;دࣦ࣯

| Rohingya

| Ṭāna, e.g. written above or below other diacritics to mark a long rising tone ().

|}

Rohingya tone markers

Historically Arabic script has been adopted and used by many tonal languages, examples include Xiao'erjing for Mandarin Chinese as well as Ajami script adopted for writing various languages of Western Africa. However, the Arabic script never had an inherent way of representing tones until it was adapted for the Rohingya language. The Rohingya Fonna are 3 tone markers which are part of the standardized and accepted orthographic convention of Rohingya. It remains the only known instance of tone markers within the Arabic script.

Tone markers act as "modifiers" of vowel diacritics. In simpler words, they are "diacritics for the diacritics". They are written "outside" of the word, meaning that they are written above the vowel diacritic if the diacritic is written above the word, and they are written below the diacritic if the diacritic is written below the word. They are only ever written where there are vowel diacritics. This is important to note, as without the diacritic present, there is no way to distinguish between tone markers and I‘jām i.e. dots that are used for purpose of phonetic distinctions of consonants.

Hārbāy

The Hārbāy as it is called in Rohingya, is a single dot that is placed on top of Fatḥah and Ḍammah, or curly Fatḥah and curly Ḍammah (vowel diacritics unique to Rohinghya), or their respective Fatḥatan and Ḍammatan versions, and it is placed underneath Kasrah or curly Kasrah, or their respective Kasratan version. (e.g. ) This tone marker indicates a short high tone ().

Automatic diacritization

The process of automatically restoring diacritical marks is called diacritization or diacritic restoration. It is useful to avoid ambiguity in applications such as Arabic machine translation, text-to-speech, and information retrieval. Automatic diacritization algorithms have been developed. For Modern Standard Arabic, the state-of-the-art algorithm has a word error rate (WER) of 4.79%. The most common mistakes are proper nouns and case endings. Similar algorithms exist for other varieties of Arabic.

See also

  • Arabic alphabet:
  • ʾIʿrab| (), the case system of Arabic
  • Rasm| (), the basic system of Arabic consonants
  • Tajwid| (), the phonetic rules of recitation of Qur'an in Arabic
  • Hebrew:
  • Hebrew diacritics, the Hebrew equivalent
  • Niqqud, the Hebrew equivalent of '
  • Dagesh, the Hebrew diacritic similar to Arabic ' and shaddah

References

<!-- to add future references -->

  • Alexis Neme and Sébastien Paumier (2019), "Restoring Arabic vowels through omission-tolerant dictionary lookup", Lang Resources & Evaluation, Vol. 53, pp. 1-65