Contenues dans
Trouver plus de documentation
Ressources d'assistance comprises
| Télécharger cet ouvrage au format PDF
- CHAPTER 1
Solaris Internationalization Overview
- Solaris 2.6 includes full Unicode 2.0 support, as defined in ISO-10646, in selected locales. Solaris 2.6 is a major release for Sun's international markets. It includes a number of new features for Asian customers and significantly expands language support for Eastern Europe and the Baltic States.
New Internationalization Features in Solaris 2.6
-
- Unicode 2.0 support
· Unicode 2.0 supported through UTF-8 in English and Korean locales
· UTF-8 locales support multi-script input and output for all European locales and Korean
- Codeset Independence
- Expanded language coverage
· Ten new locales added for Eastern Europe, Russia, Greece, Turkey, and the Baltic States
· Additional input methods provided for the Japanese locale (Wnn6 and ATOK8)
· Easy-to-use font administration tool for adding and managing fonts
- Improved PC data interoperability
· Popular Asian PC file encoding (PC-Kanji and Big5)
· TrueType font support in all versions and TrueType fonts included in Asian versions
· Utilities provided for easy two-way conversion of PC files to UNIX encoding
Internationalization and Localization
- Internationalization is the process of making software portable between languages or regions, while localization is the process of adapting software for specific languages or regions. International software can be developed using interfaces that modify program behavior at run time in accordance with specific cultural requirements. Localization involves establishing on-line information to support a language or region, called a locale.
- Unlike software that must be completely rewritten before it can work with different native languages and customs, internationalized software does not require rewriting. It can be ported from one locale to another without change. The Solaris system is internationalized, providing the infrastructure and interfaces you need to create internationalized software. Chapter 3, "Contents of the Localized Solaris 2.6 Products" and Chapter 4, "Overview of UTF-8" describe what facilities are available and how to use them.
- Internationalization and localization are different procedures.
-
- Internationalization is the process of making software that is independent of any locale. It can then be easily adapted to specific locales.
- The following localized products are available in Solaris 2.6:
-
- English Solaris
- European Solaris (German, French, Spanish, Swedish, Italian)
- Simplified Chinese Solaris for the People's Republic of China
- Traditional Chinese Solaris for Taiwan
- Japanese Solaris
- Korean Solaris
Basic Steps in Internationalization
- An internationalized application's executable image is portable between languages and regions. To internationalize software, you should:
-
- Use the interfaces described in this book to create software whose environment can be modified dynamically without the necessity of recompiling the software.
- Separate software into executable and messages. The messages include all printable and displayable messages that the user sees. Keep the message strings in a message database.
- Message strings are translated for a language and a region. A locale includes the message strings and methods to specify sorting, and so forth.
- Locales are not the same as a language. A language may contain various regions: for example, French is spoken in France and Canada, but each country has different ways of displaying monetary and time information.
- To use a localized version of a product, the user sets the environment variables (described at "Locale Categories" on page 4). The product then displays the user messages in their translated form. Date, time, currency, and other information is formatted and displayed according to locale-specific conventions.
What Is a Locale?
- The key concept for application programs is that of a program's locale. The locale is an explicit model and definition of a native-language environment. The notion of a locale is explicitly defined and included in the library definitions of the ANSI C Language standard.
- The locale consists of a number of categories for which there are language-dependent formatting or other specifications. A program's locale defines its codesets, date and time formatting conventions, monetary conventions, decimal formatting conventions, and collation order.
- A locale name is comprised of language, territory, and possibly codeset, although territory is dropped when not needed. Codeset is usually assumed. For example, German is de, an abbreviation for Deutsch, while Swiss German is de_CH, CH being an abbreviation for Confederation Helvetica.
- Generally the locale name is specified by the LANG environment variable. Locale categories are subordinate to LANG, but may be set separately, in which case they override LANG. If LC_ALL is set, it overrides not only LANG, but all the separate locale categories as well.
Full vs. Partial Locales
- A full Solaris locale has all of the listed functions and the localized system messages in that language. The German de locale is a full locale. A German user will see all system messages in German.
- Partial locales have the listed functions but they don't provide localized messages. For example, the Russian ru locale can process input, output, sorting, and so on, but it does not have localized messages in Russian. For this reason it is a partial locale.
- Some partial locales do use English messages because there may be a full locale with the localized messages. For example, the de_AT is a partial locale for Austria. Austrians speaks German, but use a different currency. The Austrian locale is a subset of the German de locale. It displays messages in German and currency in Austrian schillings instead of German marks.
Locales in Solaris
- Different cultures use different conventions for writing the date, the time, numbers, currency, delimiting words and phrases, and quoting material.
- A locale defines the behavior of a program at runtime according to a language or cultural region's conventions. Throughout the system, a locale will determine the behavior of the following:
-
- Encoding and processing of text data
- Identifying the language and encoding of resource files and their text values
- Rendering and layout of text strings
- Interchanging text that is used for interclient text communication
- Selecting the input method (that is, which codeset will be generated) and the processing of text data
- Encoding and decoding for interclient text communication
- Font and icon files that are culturally specific
- Actions and file types
- User Interface Definition (UID) files
- Date and time formats
- Numeric formats
- Monetary formats
- Collation order
- Format for informative and diagnostic messages and interactive responses
- The CDE separates language and culture-dependent information from the application and saves it outside the application.
- By separating the language and culture-dependent information from the application, the developer does not need to translate, rewrite, or recompile the application for each market. The only requirement to enter a new market is to localize the external information to the local language and customs.
Locale Categories
- The locale categories are as follows:
-
-
· LC_CTYPE
- A category which controls the behavior of character handling functions.
-
-
· LC_TIME
- This category specifies date and time formats, including month names, days of the week, and common full and abbreviated representations.
-
-
· LC_MONETARY
This category specifies monetary formats. Few SunOS system commands or
library routines actually use this category.
· LC_NUMERIC
This category specifies the decimal separator (or radix character) and the
thousands separator.
· LC_COLLATE
This category specifies the sorting order for a locale, and string conversions
required to attain this ordering.
· LC_MESSAGES
- This category specifies the language in which the localized messages will be written.
Using Locale Categories for Localization
- The localization of a product should be done in consultation with native users in that target language or region. Certain styles and information styles and formats may seem perfectly obvious and universal to the developer, but to the user, these will either look awkward, wrong, or possibly offensive. The following pages describe the elements which Solaris allows you to control and specify so that you can successfully internationalize your product.
Time Formats
-
TABLE 1-1 shows some of the ways to write 11:59 p.m.
-
TABLE 1-1
| Locale | Format |
| Canadian | 23:59 |
| Finnish | 23.59 |
-
TABLE 1-1 (Continued)
| Locale | Format |
| German | 23.59 Uhr |
| Norwegian | Kl 23.59 |
| U.K. | 11.59 PM |
- Time is represented by both a 12-hour clock and a 24-hour clock--sometimes known as "railroad time." The hour and minute separator can be either a colon ( : ) or a period (.).
- Time zone splits occur between and within countries. Although a time zone can be described in terms of how many hours it is ahead of or behind Greenwich Mean Time (GMT), this number is not always an integer. For example, Newfoundland is in a time zone that is half an hour different from the adjacent time zone.
- Daylight Savings Time (DST) starts and ends on different dates that can vary from country to country.
Date Formats
-
TABLE 1-2 shows some of the date formats used around the world. Note that even within a country, there may be variations.
-
TABLE 1-2
| Locale | Convention | Example |
| Canadian (English) | yyyy-mm-dd | 1989-08-13 |
| Canadian (French) | yyyy-mm-dd | 1989-08-13 |
| Danish | dd/mm/yy | 13/08/89 |
| Finnish | dd.mm.yyyy | 13.08.1989 |
| French | dd/mm/yy | 13/08/89 |
| German | dd.mm.yy | 13.08.89 |
| Italian | dd.mm.yy | 13.08.89 |
| Norwegian | dd.mm.yy | 13.08.89 |
| Spanish | dd-mm-yy | 13-08-89 |
-
TABLE 1-2 (Continued)
| Locale | Convention | Example |
| Swedish | yyyy-mm-dd | 1989-08-13 |
| UK-English | dd/mm/yy | 13/08/89 |
| US-English | mm-dd-yy | 08-13-89 |
Numbers
Decimal and Thousands Separators
- The United Kingdom and the United States are two of the few places in the world that use a period to indicate the decimal place. Many other countries use a comma instead. The decimal separator is also called the radix character. Likewise, while the U.K. and U.S. use a comma to separate thousands groups, many other countries use a period for this instead, and some countries separate thousands groups with a thin space. TABLE 1-3 shows some commonly used numeric formats.
-
TABLE 1-3
| Locale | Large Number |
| Canadian (French) | 4 294 967 295,00 |
| Canadian (English) | 4 294 967 295,00 |
| Danish | 4.294.967.295,00 |
| Finnish | 4.294.967.295,00 |
| French | 4.294.967.295,00 |
| German | 4 294 967 295,00 |
| Italian | 4.294.967.295,00 |
| Norwegian | 4.294.967.295,00 |
| Spanish | 4.294.967.295,00 |
| Swedish | 4.294.967.295,00 |
| UK-English | 4,294,967,295.00 |
| US-English | 4,294,967,295.00 |
- Data files containing locale-specific formats will be misinterpreted when transferred to a system in a different locale. For example, a file containing numbers in a French format will not be useful to a U.K.-specific program.
List Separators
- There are no particular locale conventions that specify how to separate numbers in a list. They are sometimes comma-delimited in the UK and the US, but often spaces and semicolons are used.
Currency
- Currency units and presentation order vary greatly around the world. TABLE 1-4 shows monetary formats in some countries.
-
TABLE 1-4
| Locale | Currency | Example |
| Canadian (English) | Dollar ($) | $1 234.56 |
| Canadian (French) | Dollar ($) | 1 234.56$ |
| Danish | Kroner (kr) | kr.1.234,56 |
| Finnish | Markka (mk) | 1.234 mk |
| French | Franc (F) | F1.234,56 |
| German | Deutsche Mark (DM) | 1,234.56DM |
| Italian | Lira (L) | L1.234,56 |
| Japanese | Yen (¥) | ¥1,234 |
| Norwegian | Krone (kr) | kr 1.234,56 |
| Spanish | Peseta (Pts) | 1.234,56Pts |
| Swedish | Krona (Kr) | 1234.56KR |
| UK-English | Pound (£) | £1,234.56 |
| US-English | Dollar ($) | $1,234.56 |
- Note that local and international symbols for currency can differ. For example, the designation for the French franc is "F" in France but this is often written as "FRF'' internationally to distinguish it from other francs, such as the Swiss franc or the Polynesian francs.
- Be aware also that a converted currency amount may take up more or less space than the original amount. To illustrate: $1,000 can become L1.307.000.
Word and Letter Differences
Word Delimiters
- Usually, words are separated by a space character. In Japanese and Thai, however, there is often no delimiter between words.
Word Order
- The order of words in phrases and sentences varies between languages. For instance, the order of the words "cat" and "black" in "a black cat" is reversed in the equivalent Spanish phrase, "uno gato negro." And in French, the negatives "ne" and "pas" surround the word they negate, as in the phrase "I do not speak," which in French is "Je ne parle pas."
Sort Order
- Sorting order for particular characters is not the same in all languages. For example, the character "ö" sorts with the ordinary "o" in Germany, but sorts separately in Sweden, where it is the last letter of the alphabet.
Character Sets
Number of Characters
- While the English alphabet contains only 26 characters, some languages contain many more characters. Japanese, for example, can contain over 40,000 characters; Chinese even more.
Western European Alphabets
- The alphabets of most western European countries are similar to the standard 26-character alphabet used in English-speaking countries, but there are often some additional basic characters, some marked (or accented) characters, and some ligatures.
Japanese Text
- Japanese text is composed of three different scripts mixed together: Kanji ideographs derived from Chinese, and two phonetic scripts (or syllabaries), Hiragana and Katakana.
- Although each character in Hiragana has an equivalent in Katakana, Hiragana is the most common script, with cursive rather than block-like letter forms. Kanji characters are used to write root words. Katakana is mostly used to represent "foreign" words--words "imported" from languages other than Japanese.
- There are tens of thousands of Kanji characters, but the number commonly used has been declining steadily over the years. Now only about 3500 are frequently used, although the average Japanese writer has a vocabulary of merely 2000 Kanji characters. Nonetheless, computer systems must support more than 7000 because that is what the Japan Industry Standard (JIS) requires. In addition, there are about 170 Hiragana and Katakana characters. On average 55% of Japanese text is Hiragana, 35% Kanji, and 10% Katakana. Arabic numerals and Roman letters are also present in Japanese text.
- Although it is possible to avoid the use of Kanji completely, most Japanese readers find text containing Kanji easier to understand.
Korean Text
- Korean is similar to Japanese in that Chinese-based ideograms, called Hanja, are mixed together with a phonetic alphabet, Hangul. Hanja is used mostly to avoid confusion when Hangul would be ambiguous.
- Hangul characters are formed by combining 10 basic vowels and 14 consonants, 2 to 5 of which compose one syllable. Hangul characters are often arranged in a square like the four on a pair of dice, so that the group takes up the same space as a Hanja character.
- Korean text requires over 6000 Hanja characters, plus about 96 Hangul characters.
Chinese Text
- Chinese usually consists entirely of characters from the ideographic script called Hanzi. In the People's Republic of China (PRC) there are about 7000 commonly used Hanzi characters, although emerging standards number Hanzi in the tens of thousands. In the Republic of China (ROC or Taiwan) current standards require more than 13,000 characters; 6000 others have been recently standardized but are considered rare.
- If a character is not a root character, it usually consists of two or more parts, two being most common. In two-part characters, one part generally represents meaning, and the other represents pronunciation. Occasionally both parts represent meaning. The radical is the most important element, and characters are traditionally arranged by radical, of which there are several hundred. The same sound can be represented by many different characters, which are not interchangeable in usage.
- Some characters are more appropriate than others in a given context--the appropriate one is distinguished phonetically by the use of tones. By contrast, spoken Japanese and Korean lack tones.
- There are several phonetic systems for representing Chinese. In mainland China the most common is pinyin, which uses roman characters and is widely employed in the West for place names such as Beijing. The Wade-Giles system is an older phonetic system, formerly used for place names such as Peking. In Taiwan zhuyin (or bopomofo), an extensive phonetic alphabet with unique letter forms, is often used instead.
- Commercial applications, particularly those that deal with people's names, need to consider the impact of codeset expansion. Many people in the ROC have names containing characters that do not exist in any standard codeset. Space needs to be provided in unassigned codesets to deal with this issue.
Codesets for x86
- The default codeset on the Solaris system for x86 is ISO-8859-1. IBM DOS 437 codeset is provided as an option in text mode; however, it is provided only at internationalization level 1. That is, if you choose to download IBM DOS 437 codeset by typing:
-
-
loadfont -c 437
pcmapkeys -f /usr/share/lib/keyboards/437/en_US
- there will be no support for nonstandard U.S. date, time, currency, numbers, units, and collation. There will be no support for non-English message and text presentation, and no multibyte character support. Therefore, non-Microsoft-Windows users should use IBM DOS 437 codeset only in the default C locale.
-
- You must be in the text mode to download the IBM codeset, not the graphics mode.
- If you are not using the standard U.S. PC keyboard, replace en_US with the keyboard map related to your keyboard.
- To download the default codeset in text mode, type:
-
-
· loadfont -c 8859
pcmapkeys -f /usr/share/lib/keyboards/8859/en_US
· See the loadfont (1) and pcmapkeys (1) manual pages.
Keyboard Differences
- Not all characters on the US keyboard appear on other keyboards. Similarly, other keyboards often contain many characters not visible on the US keyboard. However, the Compose key can be used to produce any character in the ISO Latin-1 codeset on any keyboard that supports it.
Other Differences
Punctuation
- Both the position and the type of punctuation symbols can vary between languages. In Spanish, "¿" and "¡" appear at the beginnings of sentences, while in Finnish colons ( : ) can occur inside words.
Symbols
- Commonly used symbols in one culture often have no meaning in another culture. For example, because the common U.S. rural mailbox does not exist in other countries, it would not make a universal email icon.
Measurements
- While most countries now use the metric system of measurement, the United States, parts of Canada, and the United Kingdom (albeit unofficially) still use the imperial system. The symbols for feet (') and inches (") are not understood in all countries.
Gender
- The spelling of adjectives, articles, and nouns are gender-dependent in some languages. In French, for example, "un petit gamin" and "une petite gamine" both mean "a cute kid." The first expression, however, refers to a boy, and the second expression, to a girl. Also, neuter objects in English ("a computer" for example) have gender in other languages ("un ordinateur" is a masculine noun in French).
Titles and Addresses
- Mr., Miss, Mrs., and Ms. are common titles in the US but are not used in many other countries.
- Address formats differ from country to country. In many countries, the postal code includes letters as well as numbers.
Paper Sizes
- Within each country a small number of paper sizes are commonly used, normally with one of those sizes being much more common than the others. Most countries follow ISO Standard 216 "Writing paper and certain classes of printed matter-- Trimmed sizes--A and B series."
- Internationalized applications should not make assumptions about the page sizes available to them. The Solaris system provides no support for tracking output page size; this is the responsibility of the application program itself.
-
TABLE 1-5
| Paper Type | Dimensions | Countries |
| ISO A4 | 21.0 cm by 29.7 cm | Everywhere except US |
| ISO A5 | 14.8 cm by 21.0 cm | Everywhere except US |
| JIS B4 | 25.9 cm by 36.65 cm | Japan |
-
TABLE 1-5 (Continued)
| Paper Type | Dimensions | Countries |
| JIS B5 | 18.36 cm by 25.9 cm | Japan |
| US Letter | 8.5 inch by 11 inches | US and Canada |
| US Legal | 8.5 inch by 14 inches | US and Canada |
- Standard paper trays distributed with LaserWriter and LaserWriter II printers support U.S. letter, U.S. legal, and A4 paper sizes. The SPARCprinterTM paper tray supports all these sizes, in addition to B5.
Creating Worldwide Software: The Book
- The book Creating Worldwide Software, 2nd edition, by Bill Tuthill and David Smallberg (Sun Microsystems Press, 1997), is a guide to localizing for the Solaris platform. The book is recommended for developers who work with the Solaris system See "Related Books" on page xvi for a full citation.
Overview
- The book Creating Worldwide Software is for developers and managers who develop products for the worldwide UNIX platform, especially for the Sun Solaris system.
-
-
Chapter 1, "Winning in Global Markets," briefly shows the market potential of internationalizing your products and defines the steps of internationalization and localization.
-
Chapter 2, "Understanding Linguistic and Cultural Differences," shows through examples how an item will appear in various cultures.
-
Chapter 3, "Encoding Character Sets," describes how to encode character sets in any language.
-
Chapter 4, "Establishing Your Locale Environment," looks at how a user selects a locale. It leads you through the steps of creating a specific locale for your product, including formats for time, date, money, and so on.
-
Chapter 5, "Messaging for Program Translation," explains how to prepare your product to handle localized messages. It discusses how to create and install your translated message catalogs.
-
Chapter 6, "Displaying Localized Text," discusses font, user interface, and printing issues.
-
Chapter 7, "Handling Language Input," discusses the various input methods for various languages.
-
-
Chapter 8, "Working with CDE," explains the CDE environment and your localization.
-
Chapter 9, "Motif Programming," discusses how to write applications under Motif and CDE.
-
Chapter 10, "X11 Programming," discusses internationalization with X11.
-
Chapter 11, "Communicating Network Data," discusses issues in sharing and distributing data across networks.
-
Chapter 12, "Writing International Documentation," includes guidelines for writing manuals and documentation to be translated.
-
Chapter 13, "Product Localization," discusses business issues.
-
Chapter 14, "Standards Organizations," is a summary of the international standards organizations.
-
Chapter 15, "Internationalization Checklist," has a checklist for internationalization.
-
Appendix A, "Languages, Territories, and Locale Names," lists the standard names for languages, locales, and so on.
-
Appendix B, "Locale Summaries and Keyboard Layouts," lists many locale-specific information and keyboard layouts.
-
Appendix C, "OpenWindows and DevGuide," explains how internationalization works with OpenWindows.
-
Appendix D, "XView Programming," discusses internationalization with XView.
- Appendix E, "OLIT Programming," discusses internationalization with OPEN LOOK Intrinsics Toolkit (OLIT).
-
Appendix F, "Example Program," offers a complete source code for an internationalized Motif application.
-
Appendix G, "Annotated Bibliography," is a summary of additional suggested books.
-
Appendix H, "Glossary," is a list of key terms.
|
|