Prev Up Next
Go backward to 2 Some philosophical remarks on notations
Go up to Top
Go forward to 4 SGML markup for CASL

3 Character encodings

When putting notation on a computer we have to rely upon a certain encoding of symbols. The most common ones (at present) is the ASCII 7-bit character set, with a series of 8-bit extensions known as ISO-8859. The one we are most familiar with is the Latin-1 version. This contains most of the letters in use in western Europe, such as the German umlauts (ä, Ä, ...), the French accents (á, à, â, ç, ...), the Scandinavian compounds (æ, Æ, ...), the Icelandic soft d and soft t . Other Latin characters like the palatal (?) consonants in Latvian , letters used in Lithuanian, Czech, Crotian, Lapp, ..., Polish, Esthonian, , the Rumanian accented letters, etc. are placed in other ISO Latin character sets. Versions of ISO-8859 also extend the 7-bit ASCII character set with Greek, Arabic and Hebrew characters. There are also more encompassing character encodings than the ISO Latin series, e.g., the ISO-10646 Universal Multiple-Octet Coded Character Set (UCS) or the Unicode standard that Java chose (ISO-10646 and Unicode are identical for the parts that have been defined).
CoFI Note: T-1 ---- 7 April 1997.
Comments to Magne.Haveraaen@ii.uib.no

Prev Up Next