Sayura Sinhala Input Scheme

By Anuradha Ratnaweera

Most up-to-date version of this document can always be found here.

Latest version of SCIM module is 0.3.3. Download it here.

If you are looking for a reference while typing with Sayura, download this brochure.

Introduction

Sayura is a quasi-transliteration scheme for Sinhala script. Unlike true tansliteration schemes, Sayura uses individual Latin characters to signify unmodified consonents, and not their "hal" form. For example, මම is entered as "mm", and not "mama".

History

I wrote the first implementation of Sayura transliterated scheme in mid 2004 to include in our first package (version 0.1 it was called) to enable Sinhala in GNU/Linux. It was only for GTK, and not properly named. For the GTK part of the implementation, I used some code from another im-module written by Chamath Keppitiyagama.

Sayura defines context dependant behaviour of keys. For example, "i" at the beginning of a word produced "ඉ", after another "i" it converted "ඉ" to "ඊ", after a consonant it added an ispilla, and after a consonant with an ispilla, it converted the ispilla to a diga ispilla. Multiple use of keys depending on the context allowed us to use fewer keys in very intuitive ways.

Sayura algorithm internally uses bytes, not UTF-8. Sinhala code page is mapped to characters 128-255, and ZWJ and ZWNJ are given 0x0d and 0x0c.

In September 2004, I announced an attempt to port the GTK im-module to QT. After first implementing surrounding text support, the QT port of the im-module was announced in late September 2004, and included in Sinhala GNU/Linux 0.2.

However, my patch to QT broke binary compatibility, so applying it also required a complete recompile of dependant apps. Therefore, it was unlikely to go upstream. Although Kazuki Otta's port made to QT4 upstream, QT3 apps were going to be there for a long time to come.

Typing in OpenOffice was yet to be solved.

In October 2005, there was a Codefest in Colombo as a part of the Asia OSS conference. Kazuki Otta ported the GTK module to SCIM. As SCIM stands below GTK, QT and OpenOffice and all other X apps, it provided a unified input mechanism.

The algorithm's dependency on surrounding text support was also removed. The package was called "scim-sinhala-trans".

For the next couple of years, scim-sinhala-trans became the primary input scheme on GNU/Linux systems, and was shipped with some distros including Fedora/Redhat. Debs were always available here.

In early 2008, S Pravin sent some patches to add preedit support and other improvements. I added several tweaks to the Sayura algorithm itself to preserve old semantics with the new preedit code, and also to improve in certain places. A key improvement is bringing back surrounding text support, but only to use it when available.

A development series 0.3.x was announced in May 2008 to continue and test the new set of developments, and the scheme was named "Sayura" to distinguish from other schemes such as Samanala and Sumihiri.

Key Allocation

Vowels

First, we assigned "a", "e", "i", "o" and "u" to their most obvious counterparts in Sinhala: "අ", "එ", "ඉ", "ඔ" and "උ". Remaining basic vowel "ඇ" was given he key "A" in Sayura version 0.2, but we decided to also allocate "q" to input "ඇ" in 0.3, because pressing shift to the common "ඇ" turned out to be counter-productive.

Long wovels are typed by pressing the same key twice. For example, "aa" produces "ආ".

"ඓ" and "ඖ" were allocated "I" and "O", and "U" produces long and short forms of "ඍ". "අං" and "අඃ" are given "x" and "hH", but from 0.3.1, we will allocate "Q" also to enter අඃ".

Modifiers

Modifiers are allocated the same keys as vowels, but effective when typed after a consonent. Only difference is that "a" corresponds to "ආ" instead of "අ".

For example, "kii" produces "කී", while "kU" gives us "කෘ".

Consonants

First we allocated lower case keys to consonents in the most obvious form: "r", "t", "y", "p", "s", "d", "f", "g", "h", "j", "k", "l", "c", "v", "b", "n" and "m" to "ර", "ත", "ය", "ප", "ස", "ද", "ෆ", "ග", "හ", "ජ", "ක", "ල", "ච", "ව", "බ", "න" and "ම" respectively.

The only arguable allocatoins are "t" and "d". We think "ත" and "ද" is more common than "ට" and "ඩ", so decided to risk a few initial surprises for the sake of long term efficiency.

Upper case letters are used for other consonant forms (e.g.: ඵ, ළ, ණ) whenever possible. We also introduced some shortcuts such as "M" for "ඹ". As we use "x" for "අං", upper case "X" was assigned "ඞ".

A consonant is converted to mahaprana and sagngnaka by typing upper case "H" and "G". But there are shortcuts such as "P" for "ඵ", "G" for "ඟ" and "M" for "ඹ".

Al Akuru

We allocated "w", the only unallocated and easy-to-type character, for al-kireema, and upper case "W" to add a ZWJ to create a joiner. So, typing "kwsH" creates "ක්ෂ", while "kWsH" produces joint "ක්‍ෂ".

We also have shortcuts "R" and "Y" to produce "rakaransaya" and "yansaya". E.g.: "SRii" produces "ශ්‍රී".

Installing

Latest version of SCIM module is 0.3.3. Download it here.

If you want to try out the present stable version of Sayura SCIM module (which is called scim-sinhala-trans), follow the instructions at sinhala.sourceforge.net. However, we encourage you to test 0.3 series, which brings out "preedit" support and improvements to the scheme itself.

DEB based systems

On Debian based systems, make a DEB package:

% tar -xzvf scim-sayura-0.3.3.tar.gz
% cd scim-sayura-0.3.3
% dpkg-buildpackage -b

If you get a "command not found" error, install the "dpkg-dev" package, and preferably "build-essential" package as well.

If dpkg-buildpackage complains about missing packages, install them and try again.

RPM based systems

Locate your RPM directory, usually in /usr/src/. Following examples assume /usr/src/rpm. You also need the 'rpmbuild' tool.

First unpack the tarball to a temporary location, and copy the rpm/scim-sayura.spec file to the SPECS subdirectory in RPM directory. Copy the tarball into SOURCES subdirectory.

# cd /tmp
# tar -xzvf /wherever/is/scim-sayura-0.3.3.tar.gz
# cp scim-sayura-0.3.3/rpm/scim-sayura.spec /usr/src/rpm/SPECS/
# cp /wherever/is/scim-sayura-0.3.3.tar.gz /usr/src/rpm/SOURCES/

Now run the rpmbuild command with the -bb options to build a binary RPM inside RPMS subdirectory.

# cd /usr/src/rpm/SPECS/
# rpmbuild -bb scim-sayura.spec