Unicode and OpenType misinterpret Singhala writing system

The defined terms used to describe the Singhala script in the file Creating and Supporting OpenType Fonts for Sinhala Script in the original specification of Open Type Standard (now known as OpenFont standard) show how the script is misunderstood by Unicode and the OpenType specification. This appears to be a carry-over from the way Indic scripts were analyzed, which too seems to be in error.

The code page was constructed not understanding the Singhala letters.They represent the Singhala phoneme inventory. The writing system has been defined based on a set of false premises. Obviously, engineers at Unicode Consortium have not had good advice on the language. As a result, Sinhala Unicode code block is flawed. It is incompatible with Sanskrit and Pali writing systems and corrupts the overall Singhala writing system.

Singhala that suffered constraints due to the shortcomings of the printing industry has a chance to recover with the digital technology. However, this is threatened by the unscrupulous implementation of Unicode Sinhala code page specification closing door to objective criticism. A nearly decade long intransigence seems to be the willingness among the technocracy in the country to value personal well-being above obtaining a successful solution for digitizing Singhala.

Below are definitions of some of the terms given in the page cited above with our comments in red making clarifications and correcting misunderstood concepts. It is not appropriate for us to speak on how characters are assigned by Unicode as our analysis comes up with a totally different solution for digitizing Singhala.

The words in blue may be researched at the Cologne University's Monier-Williams Sanskrit dictionary. Those who are technically inclined might find the following page more instructive on the script.
A BNF grammar for the Singhala script


The following terms are useful for understanding the layout features and script rules discussed in this document.

Akhand ligature – A required consonant ligature that may appear anywhere in the syllable and may or may not involve the base glyph. Akhand ligatures have the highest priority and are formed first; some languages include them in their alphabets

Akhanda means non-breakable. Ligatures in the Singhala writing systems are features of Sanskrit and Pali orthographies. They are not categorized into any hierarchy for integrity. Some ligatures are leaving usage because they are not seen often and also because they cannot be constructed due to weakness in the technology used for writing such as mechanical printing or digitizing according to Unicode.

The so-called alphabet in Singhala is its phoneme chart known as 'hodiya' (Sanskrit: 'so' - sound, 'dii' - fly). Hodiya does not have any ligature whatsoever. Here is the modern hodiya:
Singhala Phoneme table.
It is an extension of Sanskrit hodiya that can be viewed in Harvard-Kyoto Sanskrit scheme given here:
Sanskrit and Tamil Dictionaries.

Al-lakuna (halant/virama) – The character used after a consonant to suppress its inherent vowel

The correct term is 'hal lakuna'. ('al' is rather baby talk). hal means consonant in Sanskrit. halanta = hal+anta means the ending consonant of a word. Similarly, virama means termination.

There is no vowel to suppress in a consonant. The sign was used to signal end-of-word consonant. In the Singhala orthography, it is used to mark a free-standing consonant within a word. It is an artifact introduced to overcome difficulties posed by mechanical typesetting.

Consonant – Sinhala consonants have an inherent vowel (the short vowel /a/ called ayanna). For example, “Ka” and “Ta”, rather than just “K” or “T”

Singhala consonants are as the hodiya shows. It can now be construed that from the explanation of Al-lakuna above that there is no 'inherent vowel' in a consonant.

The confusion arose from not understanding how Indic was written before letterpress. The letters were stringed together and there were no word spacing, but understood as basic hodi-akuru (phonemes), ligatures and words. When a word ended with a consonant, it was flagged with a sign signaling that this is a terminating consonant. When a pair of consonants that do not make a ligature is followed by a vowel, the two were written touching, applying diacritics representing the vowel to the pair. The behavior is just like a ligature except the full form of the letters were preserved.

Consonant conjunct (aka ‘conjunct’) – A ligature of two or more consonants

There is no reason to give a Singhala ligature a different name. In the modern Singhala script, only the Sanskrit and Pali texts have ligatures. All ligatures are standard. A ligature is written when a given set of adjacent consonants are added at end with a vowel. A ligature is never a halanta.

Modern Singhala text is a mixture of Pure Singhala and Sanskrit. Its Singhala orthography prohibits ligatures and therefore, the hal-lakuna occurs inside Sinhala words. Although Singhala and Sanskrit are written mixed, rarely does a consonant combination making a Sanskrit ligature occur inside the Singhala portions. The Sanskrit written inside mixed text is lax in the requirement that adjoining pairs of consonants should touch, and it accepts the consonant flag. Pali usually occurs by itself and its orthography is strict.

Halant – See Al-lakuna.

Ligature – A combination of glyphs that join to form a single glyph. For example the touching-letter sequence of alpapraana kayanna and vayanna (U+0D9A, U+0DCA, U+200D, U+0DC0)

Please read under 'Consonant conjunct'.

Matra (dependent vowel) – Used to represent a vowel sound that is not inherent to the consonant. Dependent vowels are referred to as “matras” in Sanskrit. They are always depicted in combination with a single consonant, or with a consonant cluster. The greatest variation among different Indian scripts is found in the rules for attaching dependent vowels to base characters

In Sanskrit, matra means a unit measure. In the grammar, it is the length a single vowel is stretched in speech. The equivalent in Latin is 'mora'. (matra, mora, measure - Indo-European cognates). There are single and double matra length vowels in Sanskrit and Singhala. The concept is similar to the way vowels are spoken in Dutch and in Old English. Obviously, the author is confusing morae with vowel signs, the diacritics.

Repaya (reph) – The above-base form of the letter “Ra” that is used if “Ra” is the first consonant in the syllable and is not the base consonant

It is the short form of 'r', not 'ra'.

Split matra – A matra that is decomposed into pieces for rendering. Usually the different pieces appear in different positions relative to the base. For instance, part of the matra may be placed at the beginning of the cluster and another part at the end of the cluster

Actually, it is a diacritic that occurs with components on either side of the shape of a morpheme.

Touching letters – A pure consonant written touching a following letter instead of using al-lakuna. Used in classical and Buddhist texts.

Please read under consonant that describes classic orthography.

Yansaya– The post-base form of “Ya” which goes with a preceding consonant(s)

It is also used to carry the rephaya when 'r' precedes 'ya' in Sanskrit orthography.