Advantages of Standardizing pages at metta.lk


[This page is not completed and is in rough composition stage]

In my humble opinion, metta.lk was made in the great tradition of the thera brotherhood. The site does not editorialize the ancient stories. Instead, it collects all copies of existing paali (books) and presents them while comparing with each other.

Each time you see the words of most venerable Ananda, "Thus have I heard", it amazes me of his humility. It is as if he says, "This is only a second hand account". That makes me ask, "But wasn't it what the Great Sage who was your cousin, roommate reported to you nightly as the work done each day?" It is almost as if the mahaa zraavaka's example is repeated so often that it seeps into every readers mind that true greatness is true humility.

I have spoken to Ven. Mettavihari who manages the site, who explained to me what was done. He reminds me of another great thera, Gnanaponika, who also sacrificed his personal development while seeing to the completion of the German translation of the Tripitaka. He showed me the galley proof set. Going by my memory, it was about 400 to 700 ledger size airmail paper. I think the growing JuBu community in America is a result of this work. There is a strong feeling about the incongruence of religion and the current state of culture that they are digging all over for the information on the great non-religion. (I read Singhala translation of the Devadaha story.) I think metta.lk has a big role to play to fulfill that need.

metta.lk is steadily adding the English, Singhala and Polish translations. However, access to the files requires the old Times_CSX+ and Tipitaka_Sinhala fonts in the reader's computer. This does not have to be so any more. We can deliver the fonts with the pages. All browsers now recognize web fonts. We have demonstrated it here.

Web fonts
Web fonts are compressed files that are delivered with a page just the way graphics come with it. They are small and do not take up precious bandwidth. The requirement is that the fonts have to come from the same domain of the web page or a server configured to share them openly. smartfonts.net font server is such a server. We took the original fonts, compressed them and added meta data to them. (We had to remove the '+' from 'Times_CSX+' to make the woff font.) You can see the font meta data inside Firefox if you add the Addon fontinfo. After that, you right-click anywhere on a page and select View Page Info to see the font meta data.

UTF-8 is not a character set
UTF-8 (RFC 2279) is not a character set. It is an encoding scheme to transport characters safely across the network masquerading as single-byte characters. Presently, UTF-8 charset declaration is not implemented exactly as the RFC prescribes. It is applied only on the combination of US-ASCII (Basic Latin aka ASCII Punctuation aka C0 Controls) plus double byte and beyond characters. The codepoints in the Latin-1 Extension aka Latn-1 Punctuation are excepted quietly by UTF-8 implementations in browsers. They were given ISO-8859-1 and now the validators guide them to Windows-1252. This makes sense however, because these files are safe to send just the way they are without making the characters double their size (by adding a zero byte each) to conform to the UTF-8 rule.

Unwittingly, this has caused to separate languages of the world into two classes where languages using single-byte characters enjoy best conditions and the rest limps along. Just count the number of applications that support single-byte characters and how double-byte and other characters fare with common applications. M. Ohta foretold the world the path to this in 1995 in RFC 1815.

Please test the docs in this site at validator.w3.org to see what HTML pages pass validation. The ones that pass validation are the ones with 'rs' and 'alt' suffixes created using my suggestions. They are smaller too except in the case of Singhala pages. Very significantly, their charset declaration is Windows-1252.

Making standard web pages
PTS Pali uses only two diacritics: the bar and the dot. These two can be added to a regular keyboard in all three platforms, Windows, Linux and Mac. Use the Microsoft Layout Creator for Windows. My Rapid Singhala keyboard for romanized Singhala made for Linux may be followed to say, modify the Polish keyboard to add the keys for the two accents. I added them as AltGr positions of number keys 1 and 2.

One convenient procedure I think is to first type the document in a word processor with the accents and then to get the accented letters converted to their NCRs at Pinyin.info. The resulting text is Windows-1252 characters. Now you can paste it to a HTML5 skeleton and finish the page. The charset meta data declaration is windows-1252.

In the alternative, you can make the HTML document entirely with Unicode characters such as by typing it into Notepad++ and saving as UTF-8. The charset declaration in this case would be UTF-8. One very important caviat: Do not include any character from Latin-1 code block. They will be shown as glyph-not-defined even if the font has glyphs for them. For instance, if you declare charset as UTF-8, you see the not-defined glyph for '¤' character, which is used by Times_CSX for enye (ñ).

Please read the work description files for each language to learn how we made their respective pages.