Mojikyō

(), also known by its full name , is a character encoding scheme created to provide a complete index of characters used in the Chinese, Japanese, Korean, Vietnamese Chữ Nôm and other historical Chinese logographic writing systems. The , which published the character set, also published computer software and TrueType computer fonts to accompany it. The Mojikyō Institute, chaired by , originally had its character set and related software and data redistributed on CD-ROMs sold in Kinokuniya stores.

Conceptualized in 1996, the first version of the CD-ROM was released in July 1997. For a time, the Mojikyō Institute also offered a web subscription, termed " WEB" (), which had more up-to-date characters. Among those, 150,366 characters (≈86%) then belonged to the extended Chinese–Japanese–Korean–Vietnamese (CJKV) family. Many of Mojikyō<nowiki/>'s characters are considered obsolete or obscure, and are not encoded by any other character set, including the most widely used international text encoding standard, Unicode.

Originally a paid proprietary software product, as of 2015, the Mojikyō Institute began to upload its latest releases to Internet Archive as freeware, as a memorial to honor one of its developers, , who died that year. However, has much looser standards than Unicode for encoding, which leads to have many encoded glyphs of dubious, or even unintentionally fictional, origin. As such, while many non-Unicode characters are suitable for addition to Unicode, not all can become Unicode characters, due to the differing standards of evidence required by each.

Composition

The fonts () are TrueType fonts that come in a ZIP file and are each around 25 megabytes; the different fonts contain different numbers of characters. Also included is a Windows executable that implements a graphical character map, the " Character Map" (), . allows users to browse through the fonts, and copy and paste characters in lieu of typing them on the keyboard. As opposed to the regular Windows character map, or for that matter KCharSelect, which both support TrueType fonts, displays the numbered encoding slot of the requested character. In order for to work, all fonts must be installed.

Encoding

When referring to a character encoded in , the format MXXXXXX is often used, similar to the U+XXXX format used for Unicode. A difference, however, is that encodings displayed this way are decimal, while Unicode's U+ encoding is hexadecimal.

From the earliest days of Unicode, has both influenced—and been influenced by—the standard. Glyphs originating from first appear in a proposal to the Ideographic Rapporteur Group (IRG), which is responsible for maintaining all CJK blocks in Unicode, on 18 April 2002. In May 2007, played a minor role in an eventually successful series of proposals to encode the Tangut script in Unicode; already had within its encoding 6,000 Tangut characters by October 2002. abbreviated "JK". an ideograph read in Japanese as , has a J-Source equal to JK-66038. All Unicode characters with a JK-prefixed J-Source originate from . According to Ken Lunde, a subject matter expert in character encodings and East Asian languages, as of Unicode 13.0, 782 ideographs in Unicode originate from , split somewhat evenly between two blocks: CJK Unified Ideographs Extension C, with 367, and CJK Unified Ideographs Extension E, with 415. Not all Unicode characters with origins (JK-prefixed J-Sources) have the same representative glyph in the code chart as in the font; some characters had their shapes changed before final encoding, as investigation showed the shapes assigned by the Mojikyō Institute were wrong.

Blocks

it encoded 174,975 characters.

No unification

Unlike Unicode, purposely avoids Han unification; no attempt at compactness of the encoding is made, nor is there an attempt to keep all common characters below U+FFFF as there is in Unicode.

Unicode, on the other hand, sorts its CJK into blocks based on how common they are: the most common are generally put into the Basic Multilingual Plane, Mere data, sometimes including the shapes of letters, are considered in many jurisdictions to be common property as they do not meet the threshold of originality.

Due to this legacy, however, disallowed data as of 2020.

Collected writing systems

Living

Chinese — Hanzi
Japanese — Kanji, Kana (including Hentaigana)
Korean — Hanja
Latin alphabet with diacritics
Cyrillic script with diacritics

Dead or obsolete

Ancient Chinese
Oracle bone script
Seal script
Taiwanese kana
Vietnamese — Chữ Nôm
Sanskrit — Siddhaṃ
Tangut script
Sui script

Composition

Encoding

Blocks

No unification

Collected writing systems

Living

Dead or obsolete

See also

References

Notes

External links