|
The History of Digital Tibetan
Correcting a Digital Tibetan Tower
of Babel Digital Tibetan publishing requires software that recognizes and works
with the Tibetan language at a basic level. For many years, however, computers were incapable of accepting any non-English text. The personal computer as we know it was originally designed to
utilize, display and work only with a standard promulgated by the American computer industry to insure information exchange called ASCII (an acronym for American Standard Characters for
International Information). Under this standard, the ASCII-character set was designed as a basic set of characters from the English alphabet, composed of lower and upper case
plus a few basic symbols like periods and commas. Computers were for many years manufactured to recognize only this small character set, and to display text on a screen which was limited to
glyphs in the Roman character set. Space is an important parameter of this standard, since less than
100 programming spaces are needed for ASCII. For this reason, ASCII languages are referred to as "single byte" languages. The restriction of many digital systems to a character set parallel to
the ASCII-standard, adopted by the International Standards Organization ("ISO") some decades ago, has thus been the most fundamental barrier to digitizing Tibetan.
Processing and display of non-Roman characters by personal computers on a widespread basis has only begun to be possible within the last five years. In part, these advances have been due
to the widespread adoption of the Microsoft Windows operating system, the capability of that operating system to display glyphs from any character set and the internationalization of the personal
computer industry. Despite these advances, programs remain anchored to the ASCII world, and the ability of software to work with non-English languages is only slowly developing. Often, an
ASCII-encoded reality within the digital processes precludes computers from fully processing Tibetan information. Most solutions which allow digital processing and display of non-English characters are ad hoc.
Nevertheless, intrepid and courageous developers of Tibetan-language based software programs, dauntlessly taking advantage of nascent processing and display capabilities,
created a number of proprietary font sets and keyboard routines, and jury-rigged computers to think they were displaying Roman characters when they were in fact outputting Tibetan-language
glyphs. This workaround allowed a sudden leap to digital processing of Tibetan documents, but with restrictions remaining from the ASCII based encoding processes in the digital underbelly.
Moreover, virtually all of these activities were carried on in software development basements. Thus they grew without any common standards for the creation of Tibetan documents.
Because one developerīs jury-rigged solution was different from anotherīs, documents from one Tibetan-language program could not be read by another program. Even today, one specific
Tibetan-language word processing system cannot communicate with another, and the fonts used to create Tibetan electronic documents in one system typically cannot be used in another. The
result was a "Tower of Babel" effect within the digital Tibetan world, where one developerīs software could not understand another developerīs software. Specific characteristics of the Tibetan-language exacerbated
these incompatibilities. Because the thirty consonants in the Tibetan alphabet can be combined or "stacked" in a variety of ways, typical Tibetan fonts that have been developed with
pre-combined stacks have required a minimum of 200 glyphs, and sometimes hundreds more. However, Roman character font sets are limited to less than 256 characters. To create a Tibetan
glyph set, font developers have filled all available spaces in the 200+ space font sets with glyphs. There is no standard manner of creating a single-byte glyph set, and accordingly, each
Tibetan-language font designer has created a nonstandard font encoding. A font used in one program is not interchangeable with a font used in another program.
As an outgrowth of the use of these overlarge font encodings, a major problem developed with Tibetan
typography. The inconsistent implementation of font sets among different systems included the use of reserved characters by Tibetan font developers. Programs using
roman glyph fonts, which need less than 100 characters to fully display most words, would typically reserve certain spaces in font encodings for special programming. These reserved characters
are then used to implement features of word processing and desktop publishing software programs. When Tibetan fonts subsequently violate the reserved character "understandings,"
and use the reserved spaces out of the necessity of obtaining a sufficiently large glyph set, the software programs that use the reserved characters for programming calls either break or
display certain characters incorrectly. Such display and software conflicts and breakages are common among Tibetan word processing software programs. Nitartha Solutions
When Nitartha international began its mission to create and preserve Tibetan texts and prepare for the modern digital publication environment, it was faced with a significant challenge. Tibetan-language word
processing systems then in use were limited by conflicting standards, incompatibilities with different hardware and operating systems, and general user-unfriendliness. The
problems with current programs were primarily caused by the situation outlined above, and reflected the general lack of support in commercial operating systems for foreign language input and output. Since that time, due to a number of factors such as the confluence of international markets, a variety of different
operating system vendors, such as the Microsoft Corporation in their Windows 95, 98 and NT operating systems, have begun to enable operating systems for foreign language publication,
including publications in Asian languages. Unicode is closer than ever to final acceptance. Nitartha international is developing Tibetan-language word processing software to take advantage of these new digital
processing capacities. In our short-term development program, we are working to increase the compatibility of the
Nitartha-Sambhota program with other systems and non-Tibetan software. In our long-range development program, we are working to implement the Unicode standard. And on this site, we
show you how to use Tibetan-language programs to produce WebTibetan documents, which have the potential to be universally accessible Developing Future Standards
International institutions are beginning to create remedies for these problems. To take advantage of these developing standards, Tibetan-language software developers should be
encouraged with the following steps. First, Tibetan font designers can develop a standard for font encoding so that fonts in one system may be interchanged with fonts in another system.
Potential standards organizations involved in Tibetan, such as Otani or Virginia Universities, may be able to encourage such a development, and Nitartha international is actively working to
promote such standards. Second, as fonts evolve from single byte to multibyte systems, the Tibetan-language will be part of future standards. Already, The Unicode Consortium has adopted a standard for Tibetan
. (A more technical discussion of the relationship between the ASCII encoding and Unicode is available at
czyborra.com.) It has also become possible to take advantage immediately of
HTML standards to encourage common encodings for fonts so that Tibetan-language documents can be displayed on the World Wide Web. We call this "WebTibetan." Unicode development |