How we change what others think, feel, believe and do
CHAPTER 10 : Programming Usage
Localization (also called Globalization or Internationalization!) is about writing a program that can be easily ported to another language. Not to Pascal, but to French or German. This problem is not limited to simply translating the text that is shown; differences between languages include such as date and number formats, and other subtle variations. If available, a native language support library will help cope with the differences.
10.9.1 Word differences
Translation of messages are often done by a native of the target country. His task will be easier and safer if there is are no printable messages embedded in the source code. The alternative is to put them in a separate data file, which is preferably independently compiled from the main source code. This file may also include all other language-specific information, thus simplifying the localization process.
Another problem with words is where a screen layout is designed just to fit in English words, it is unlikely that the words for all other languages will fit into the same space.
Hyphenation also, varies by country. There is usually a complex algorithm, supported by an exception list (for the words which do not follow the rules).
10.9.2 Different character sets
Most languages have their own character sets, which brings a whole set of problems with it. European languages mostly use Latin characters, but with various accents, and with each letter-accent pair effectively a separate character. There are considerably less than 256 characters in each set, which enables them to be easily represented within 8 bits. If, however, you consider Asian character sets, then at least 16 bit characters are required (Chinese consists of at least 25,000 characters!).
Another problem: the order in which letters are sorted is not necessarily in numerical code order (as English is in ASCII). Thus tables are required to define collation sequence.
Capitalization also does not always follow simple rules. Thus a lower case 'e' with an acute accent capitalizes to an upper cases 'E', but which may or may not retain the accent, depending on the language.
Different countries format different items in different ways. Some of these items are small enough to enable them to be catered for with a simple switch within the program, whilst others will require larger amounts of code. Thus:
Date and time Almost every country has some variation on date layout. 4/5/95 may be the 4th of May or the 5th of April.
Numbers Some countries switch the period and the comma in number formats. Thus 12,340.45 may be written 12.340,45.
Money Each country has its own way of denoting its money symbol. Often it is a special character (eg. $) and it may be on the left or right of the number.
There are other differences which will vary by application and language. For example, problems encountered when writing a word processor will include:
And the big