Contained Within
Find More Documentation
Featured Support Resources
| Download this book in PDF
- CHAPTER 4
Overview of UTF-8
The Universal Transformation Format
- The File System Safe Universal Transformation Format, or UTF-8, is an encoding defined by X/Open-Uniforum Joint Internationalization Working Group (XoJIG) of X/Open as a multi-byte representation of Unicode. The en_US.UTF-8 locale is the first locale that uses UTF-8 as the codeset to support multi-scripts in the Solaris system.
- The locale supports computation for every code point value defined at Unicode 2.0/ ISO/IEC 10646-1. However, due to the limited set of font resources and the fact that few users intend to use all of the code point values, users of the en_US.UTF-8 locale will see only character glyphs from the following scripts:
-
- ISO 8859-1 (Latin-1)
- ISO 8859-2 (Latin-2)
- ISO 8859-4 (Latin-4)
- ISO 8859-5 (Latin/Cyrillic)
- ISO 8859-7 (Latin/Greek)
- ISO 8859-9 (Latin-5)
- Also, since this locale is primarily for developers, it belongs to the developer 's cluster of Solaris 2.6. Therefore, when you install Solaris 2.6, you should choose the developer's cluster to install the locale on your system. For more information, see Chapter 5, "Installation."
-
Note - Motif and the CDE libraries have support for the en_UTF-8 locale. OpenWindows, XView, and OPENLOOK do not support en_UTF-8.
System Environment
Locale Environment Variable
- To use the en_US.UTF-8 locale environment, make sure the locale is installed on your system, then choose the locale as follows.
- In a TTY environment, choose the locale by setting the LANG environment variable to en_US.UTF-8, as in the following C-shell example:
-
system% setenv LANG en_US.UTF-8
|
- Make sure other categories are not set (or are set to en_US.UTF-8) since the LANG environment variable has a lower priority than other environment variables such as LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_NUMERIC, LC_MONETARY and LC_TIME at setting the locale. See the setlocale(3C) man page for more details about the hierarchy of environment variables.
- To check current locale settings in various categories, use the locale(1) utility as shown below:
-
system% locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=
|
- You can also start the en_US.UTF-8 environment from the CDE desktop at the CDE login screen's Options -> Language menu and choosing en_US.UTF-8.
TTY Environment Setup
- To ensure correct text edit operation by a terminal or by a terminal emulator such as dtterm(1), users should push certain locale-specific STREAMS modules onto their Streams.
- For more information on STREAMS modules and streams in general, see the STREAMS Programming Guide.
- The following table shows STREAMS modules supported by the en_US.UTF-8 locale in the terminal environment:
-
TABLE 4-1 en_US.UTF-8
| STREAMS Module | Description |
| /usr/kernel/strmod/eucu8 | UTF-8 STREAMS module for tail side |
| /usr/kernel/strmod/u8euc | UTF-8 STREAMS module for head side |
| /usr/kernel/strmod/u8lat1 | Code conversion STREAMS module between UTF-8 and ISO 8859-1 |
| Western European /usr/kernel/strmod/u8lat2 | Code conversion STREAMS module between UTF-8 and ISO 8859-2 |
| Eastern European /usr/kernel/strmod/u8koi8 | Code conversion STREAMS module between UTF-8 and KOI8-R (Cyrillic) |
Loading a STREAMS Module at Kernel
- To load a STREAMS module at kernel, first become superuser:
-
system% su
Password:
system#
|
- Use modinfo(1M) to be certain that your system has not already loaded the STREAMS module:
-
system# modinfo | grep modulename
|
- If the STREAMS module, such as eucu8, is already installed, the output will look as follows:
-
system# modinfo | grep eucu8
89 ff798000 4b13 18 1 eucu8 (eucu8 module)
system#
|
- If the module is already installed, you don't need to load it. However, if the module has not yet been loaded, use modload(1M) as follows:
-
system# modload /usr/kernel/strmod/modulename
|
- The STREAMS module is installed at the kernel, and you can now push it onto a Stream.
- To unload a module from the kernel, use modunload(1M), as shown below. In this example, the eucu8 module is being unloaded.
-
system# modinfo | grep eucu8
89 ff798000 4b13 18 1 eucu8 (eucu8 module)
system# modunload -i 89
|
dtterm and Terminals Capable of Input and Output UTF-8
- The dtterm(1) and any terminal that supports input and output of UTF-8 codeset should have following STREAMS configuration:
-
-
head <-> u8euc <-> ttcompat <-> ldterm <-> eucu8 <-> pseudo-TTY
- In this example, u8euc and eucu8 are the modules supported by the en_US.UTF-8 locale.
- To set up the above STREAMS configuration, use strchg(1), as shown below:
-
system% cat > /tmp/mystreams
u8euc
ttcompat
ldterm
eucu8
ptem
^D
system% strchg -f /tmp/mystreams
|
- When using strchg(1), be sure you are either superuser or the owner of the device. To see the current configuration of the STREAMS, use strconf(1) as shown below:
-
system% strconf
u8euc
ttcompat
ldterm
eucu8
ptem
pts
system%
|
- To revert to the original configuration, set the STREAMS configuration again as shown below:
-
system% cat > /tmp/orgstreams
ttcompat
ldterm
ptem
^D
system% strchg -f /tmp/orgstreams
|
Terminal Support for Latin-1, Latin-2, or KOI8-R
- For terminals that support only Latin-1 (ISO 8859-1), Latin-2 (ISO 8859-2), or KOI8-R, you should have the following STREAMS configuration:
-
-
head <-> u8euc <-> ttcompat <-> ldterm <-> eucu8 <-> u8lat1 <-> TTY
-
Note - This configuration is only for terminals that support Latin-1. For Latin-2 terminals, replace the STREAMS module u8lat1 with u8lat2. For KOI8-R terminals, replace the module with u8koi8.
- To set up the STREAMS configuration shown above, use strchg(1), as follows:
-
system% cat > /tmp/mystreams
u8euc
ttcompat
ldterm
eucu8
u8lat1
ptem
^D
system% strchg -f /tmp/mystreams
|
- Be sure that you are either superuser or the owner of the device when you use strchg(1). To see the current configuration, use strconf(1), as follows:
-
system% strconf
u8euc
ttcompat
ldterm
eucu8
u8lat1
ptem
pts
system%
|
- To revert to the original configuration, set the STREAMS configuration as follows:
-
system% cat > /tmp/orgstreams
ttcompat
ldterm
ptem
^D
system% strchg -f /tmp/orgstreams
|
Setting Terminal Options
- To set up UTF-8 text edit behavior on TTY, you must first set some terminal options using stty(1), as follows:
-
system% /bin/stty cs8 -istrip defeucw
|
-
Note - Since /usr/ucb/stty is not yet internationalized, you should use /bin/stty instead.
- You can also query the current settings using stty(1) with the -a option, as shown below:
-
Saving the Settings in ~/.cshrc
- Assuming the necessary STREAMS modules are already loaded with the kernel, you can save the following lines in your .cshrc file (C shell example) for convenience:
-
setenv LANG en_US.UTF-8
if ($?USER != 0 && $?prompt != 0) then
cat >! /tmp/mystreams$$ << _EOF
u8euc
ttcompat
ldtterm
eucu8
ptem
_EOF
/bin/strchg -f /tmp/mystream$$
/bin/rm -f /tmp/mystream$$
/bin/stty cs8 -istrip defeucw
endif
|
- With these lines in your.cshrc file, you do not have to type all of the commands each time. Note that the second _EOF should be in the first column of the file. You can also create a file called mystreams and save it so the .cshrc references to mystreams instead of creating it whenever you start a C shell.
Code Conversions
- The en_US.UTF-8 locale supports various code conversions among major codesets of several countries through iconv(1) and iconv(3).
- The available fromcode and tocode names that can be applied to iconv(1) and iconv_open(3)are shown in TABLE 4-2:
-
TABLE 4-2 en_US.UTF-8
| From Code | To Code | Description |
| 646 | UTF-8 | ISO 646 (US-ASCII) to UTF-8 |
| UTF-8 | 8859-1 | UTF-8 to ISO 8859-1 |
| UTF-8 | 8859-2 | UTF-8 to ISO 8859-2 |
| UTF-8 | 8859-3 | UTF-8 to ISO 8859-3 |
| UTF-8 | 8859-4 | UTF-8 to ISO 8859-4 |
| UTF-8 | 8859-5 | UTF-8 to ISO 8859-5 (Cyrillic) |
| UTF-8 | 8859-6 | UTF-8 to ISO 8859-6 (Arabic) |
| UTF-8 | 8859-7 | UTF-8 to ISO 8859-7 (Greek) |
| UTF-8 | 8859-8 | UTF-8 to ISO 8859-8 (Hebrew) |
| UTF-8 | 8859-9 | UTF-8 to ISO 8859-9 |
| UTF-8 | 8859-10 | UTF-8 to ISO 8859-10 |
| 8859-1 | UTF-8 | ISO 8859-1 to UTF-8 |
| 8859-2 | UTF-8 | ISO 8859-2 to UTF-8 |
| 8859-3 | UTF-8 | ISO 8859-3 to UTF-8 |
| 8859-4 | UTF-8 | ISO 8859-4 to UTF-8 |
| 8859-5 | UTF-8 | ISO 8859-5 (Cyrillic) to UTF-8 |
| 8859-6 | UTF-8 | ISO 8859-6 (Arabic) to UTF-8 |
| 8859-7 | UTF-8 | ISO 8859-7 (Greek) to UTF-8 |
| 8859-8 | UTF-8 | ISO 8859-8 (Hebrew) to UTF-8 |
| 8859-9 | UTF-8 | ISO 8859-9 to UTF-8 |
| 8859-10 | UTF-8 | ISO 8859-10 to UTF-8 |
| UTF-8 | KOI8-R | UTF-8 to KOI8-R (Cyrillic |
| KOI8-R | UTF-8 | KOI8-R (Cyrillic) to UTF-8 |
| UTF-8 | UCS-2 | UTF-8 to UCS-2 |
| UCS-2 | UTF-8 | UCS-2 to UTF-8 |
| UTF-8 | UCS-4 | UTF-8 to UCS-4 |
| UCS-4 | UTF-8 | UCS-4 to UTF-8 |
| UTF-8 | UTF-7 | UTF-8 to UTF-7 |
-
TABLE 4-2 en_US.UTF-8 (Continued)
| From Code | To Code | Description |
| UTF-7 | UTF-8 | UTF-7 to UTF-8 |
| UTF-8 | UTF-16 | UTF-8 to UTF-16 |
| UTF-16 | UTF-8 | UTF-16 to UTF-8 |
| UTF-8 | eucJP | UTF-8 to Japanese EUC |
| UTF-8 | PCK | UTF-8 to Japanese PC Kanji (a.k.a. SJIS) |
| eucJP | UTF-8 | Japanese EUC to UTF-8 |
| PCK | UTF-8 | Japanese PC Kanji (a.k.a. SJIS) to UTF-8 |
| UTF-8 | ko_KR-euc | UTF-8 to Korean EUC |
| UTF-8 | ko_KR-johap | UTF-8 to Korean Johap (KS C 5601-1987 |
| UTF-8 | ko_KR-johap92 | UTF-8 to Korean Johap (KS C 5601-1992) |
| UTF-8 | ko_KR-iso2022-7 | UTF-8 to ISO-2022-KR |
| ko_KR-euc | UTF-8 | Korean EUC to UTF-8 |
| ko_KR-johap | UTF-8 | Korean Johap (KS C 5601-1987) to UTF-8 |
| ko_KR-johap92 | UTF-8 | Korean Johap (KS C 5601-1992) to UTF-8 |
| ko_KR-iso2022-7 | UTF-8 | ISO-2022-KR to UTF-8 |
| UTF-8 | gb2312 | UTF-8 to Chinese/PRC EUC (GB 2312-1980 |
| UTF-8 | iso2022 | UTF-8 to ISO-2022-CN |
| gb2312 | UTF-8 | Chinese/PRC EUC (GB 2312-1980) to UTF-8 |
| iso2022 | UTF-8 | ISO-2022-CN to UTF-8 |
| UTF-8 | zh_TW-euc | UTF-8 to Chinese/Taiwan EUC (CNS 11643-1992) |
| UTF-8 | zh_TW-big5 | UTF-8 to Chinese/Taiwan Big5 |
| UTF-8 | zh_TW-iso2022-7 | UTF-8 to ISO-2022-TW |
| zh_TW-euc | UTF-8 | Chinese/Taiwan EUC (CNS 11643-1992) to UTF-8 |
| zh_TW-big5 | UTF-8 | Chinese/Taiwan Big5 to UTF-8 |
| zh_TW-iso2022-7 | UTF-8 | ISO-2022-TW to UTF-8 |
- For more details on iconv code conversion, see the iconv(1), iconv_open(3), iconv(3), and iconv_close(3) man pages. For more information on available code conversions, see iconv_en_US.UTF-8(5).
Script Selection and Input Modes
- The en_US.UTF-8 locale supports multiple scripts. This section contains details about each of the input modes: English, Cyrillic, and Greek.
English Input Mode
- The English input mode encompasses not only the English alphabet but also characters with diacritical marks (for example, á, è, î, õ, and ü) and special characters (such as ¡, £, ¢, §, ¿).
- The English input mode is the default mode for any application. The input mode is displayed at the bottom left corner of the GUI application, as shown in FIGURE 4-1:

FIGURE 4-1
- To insert characters with diacritical marks or special characters from Latin-1, Latin-2, Latin-4, and Latin-5, you must type a compose sequence, as shown in the following examples:
-
- For Ä, press and release Compose, then A, and then "
- For ¿, press and release Compose, then +, and then -
- The following tables are the most commonly used compose sequences in Latin-1, Latin-2, Latin-4, and Latin-5 script input.
-
TABLE 4-3
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | [spacebar] | [spacebar] | Non-breaking space |
| Compose | s | 1 | Superscripted 1 |
| Compose | s | 2 | Superscripted 2 |
| Compose | s | 3 | Superscripted 3 |
| Compose | ! | ! | Inverted exclamation mark |
| Compose | x | o | Currency symbol '¤' |
| Compose | p | ! | Paragraph symbol '¶' |
-
TABLE 4-3 (Continued)
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | / | u | mu 'u' |
| Compose | ' |
| apostrophe ''' |
| Compose | ' |
| acute accent '´' |
| Compose | , | , | cedilla '¸' |
| Compose | " | " | dieresis '¨' |
| Compose | - | ^ | macron '¯' |
| Compose | o | o | degree '°' |
| Compose | x | x | multiplication sign 'x' |
| Compose | + | - | plus-minus '¿' |
| Compose | - | - | soft hyphen '-' |
| Compose | - | : | division sign '/' |
| Compose | - | a | ordinal (feminine) a 'ã' |
| Compose | a | - | ordinal (feminine) a 'ã' |
| Compose | - | o | ordinal (masculine) o 'õ' |
| Compose | o | - | ordinal (masculine) o 'õ' |
| Compose | - | , | not sign '¬' |
| Compose | . | . | middle dot '.' |
| Compose | 1 | 2 | vulgar fraction 1/2 |
| Compose | 1 | 4 | vulgar fraction 1/4 |
| Compose | 3 | 4 | vulgar fraction 3/4 |
| Compose | < | < | left double angle quotation mark '«' |
| Compose | > | > | right double angle quotation mark '»' |
| Compose | ? | ? | inverted question mark '¿' |
| Compose | A | ` | A grave 'À' |
| Compose | A | ' | A acute 'Á' |
| Compose | A | * | A ring above 'Å' |
| Compose | A | " | A dieresis 'Ä' |
| Compose | A | ^ | A circumflex 'Â' |
| Compose | A | ~ | A tilde 'Ã' |
-
TABLE 4-3 (Continued)
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | A | E | AE diphthong 'Æ' |
| Compose | C | , | C cedilla 'Ç' |
| Compose | C | o | copyright sign '(C)' |
| Compose | D | - | Capital eth 'D' |
| Compose | E | ` | E grave 'È' |
| Compose | E | ' | E acute 'É' |
| Compose | E | " | E dieresis 'Ë' |
| Compose | E | ^ | E circumflex 'Ê' |
| Compose | I | ` | I grave 'Ì' |
| Compose | I | ' | I acute 'Í' |
| Compose | I | " | I dieresis 'Ï' |
| Compose | I | ^ | I circumflex 'Î' |
| Compose | L | - | pound sign '£' |
| Compose | N | ~ | N tilde 'Ñ' |
| Compose | O | ` | O grave 'Ò' |
| Compose | O | ' | O acute 'Ó' |
| Compose | O | / | O slash 'Ø' |
| Compose | O | " | O dieresis 'Ö' |
| Compose | O | ^ | O circumflex 'Ô' |
| Compose | O | ~ | O tilde 'Õ' |
| Compose | R | O | registered mark '(R)' |
| Compose | T | H | Thorn 'P' |
| Compose | U | ` | U grave 'Ù' |
| Compose | U | ' | U acute 'Ú' |
| Compose | U | " | U dieresis 'Ü' |
| Compose | U | ^ | U circumflex 'Û' |
| Compose | Y | ' | Y acute 'Y' |
| Compose | Y | - | yen sign '¥' |
| Compose | a | ` | a grave 'à' |
-
TABLE 4-3 (Continued)
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | a | ' | a acute 'á' |
| Compose | a | * | a ring above 'å' |
| Compose | a | " | a dieresis 'ä' |
| Compose | a | ^ | a circumflex 'â' |
| Compose | a | ~ | a tilde 'ã' |
| Compose | a | ^ | a circumflex 'â' |
| Compose | a | e | ae diphthong 'æ' |
| Compose | c | , | c cedilla 'ç' |
| Compose | c | / | cent sign '¢' |
| Compose | c | o | copyright sign '(C)' |
| Compose | d | - | eth 'd' |
| Compose | e | ` | e grave 'è' |
| Compose | e | ' | e acute 'é' |
| Compose | e | " | e dieresis 'ë' |
| Compose | e | ^ | e circumflex 'ê' |
| Compose | i | ` | i grave 'ì' |
| Compose | i | ' | i acute 'í' |
| Compose | i | " | i dieresis 'ï' |
| Compose | i | ^ | i circumflex 'î' |
| Compose | n | ~ | n tilde 'ñ' |
| Compose | o | ` | o grave 'ò' |
| Compose | o | ' | o acute 'ó' |
| Compose | o | / | o slash 'ø' |
| Compose | o | " | o dieresis 'ö' |
| Compose | o | ^ | o circumflex 'ô' |
| Compose | o | ~ | o tilde 'õ' |
| Compose | s | s | German double s 'ß' |
| Compose | t | h | thorn 'p' |
| Compose | u | ` | u grave 'ù' |
-
TABLE 4-3 (Continued)
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | u | ' | u acute 'ú' |
| Compose | u | " | u dieresis 'ü' |
| Compose | u | ^ | u circumflex 'û' |
| Compose | y | ' | y acute 'y' |
| Compose | y | " | y dieresis 'y' |
| Compose | | | | | broken bar '|' |
-
TABLE 4-4 contains the Latin-2 compose sequences.
-
Note - Composes sequences defined in TABLE 4-3 are not included in TABLE 4-4.
-
TABLE 4-4
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | a | ' | ogonek á |
| Compose | u | ' ' | breve ü |
| Compose | v | ' ' | caron |
| Compose | " | ' ' | double acute ¨ |
| Compose | A | a | A ogonek a |
| Compose | A | u | A breve |
| Compose | C | ' | C acute |
| Compose | C | v | C caron |
| Compose | D | v | D caron |
| Compose | - | D | D stroke |
| Compose | E | v | E caron |
| Compose | E | a | E ogonek |
| Compose | L | ' | L acute |
| Compose | L | - | L stroke |
| Compose | L | > | L caron |
| Compose | N | ' | N acute |
-
TABLE 4-4 (Continued)
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | N | v | N caron |
| Compose | O | > | O double acute |
| Compose | S | ' | S acute |
| Compose | S | v | S caron |
| Compose | S | , | S cedilla |
| Compose | R | ' | R acute |
| Compose | R | v | R caron |
| Compose | T | v | T caron |
| Compose | T | , | T cedilla |
| Compose | U | * | U ring above |
| Compose | U | > | U double acute |
| Compose | Z | ' | Z acute |
| Compose | Z | v | Z caron |
| Compose | Z | . | Z dot above |
| Compose | a | a | a ogonek |
| Compose | a | u | a breve |
| Compose | c | ' | c acute |
| Compose | c | v | c caron |
| Compose | d | v | d caron |
| Compose | - | d | d stroke |
| Compose | e | v | e caron |
| Compose | e | a | e ogonek |
| Compose | l | ' | l acute |
| Compose | l | - | l stroke |
| Compose | l | > | l caron |
| Compose | n | ' | n acute |
| Compose | n | v | n caron |
| Compose | o | > | o double acute |
| Compose | s | ' | s acute |
-
TABLE 4-4 (Continued)
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | s | v | s caron |
| Compose | s | , | s cedilla |
| Compose | r | ' | r acute |
| Compose | r | v | r caron |
| Compose | t | v | t caron |
| Compose | t | , | t cedilla |
| Compose | u | * | u ring above |
| Compose | u | > | u double acute |
| Compose | z | ' | z acute |
| Compose | z | v | z caron |
| Compose | z | . | z dot above |
-
TABLE 4-5 contains the Latin-4 compose sequences.
-
Note - Compose sequences defined in TABLE 4-3 or TABLE 4-4 are not included in this table.
-
TABLE 4-5
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | k | k | kra |
| Compose | A | _ | A macron |
| Compose | E | _ | E macron |
| Compose | E | . | E dot above |
| Compose | G | , | G cedilla |
| Compose | I | _ | I macron |
| Compose | I | ~ | I tilde |
| Compose | I | a | I ogonek |
| Compose | K | , | K cedilla |
| Compose | L | , | L cedilla |
-
TABLE 4-5 (Continued)
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | N | , | N cedilla |
| Compose | O | _ | O macron |
| Compose | R | , | R cedilla |
| Compose | T | | | T stroke |
| Compose | U | ~ | U tilde |
| Compose | U | a | U ogonek |
| Compose | U | _ | U macron |
| Compose | N | N | Eng |
| Compose | a | _ | a macron |
| Compose | e | _ | e macron |
| Compose | e | . | e dot above |
| Compose | g | , | g cedilla |
| Compose | i | _ | i macron |
| Compose | i | ~ | i tilde |
| Compose | i | a | i ogonek |
| Compose | k | , | k cedilla |
| Compose | l | , | l cedilla |
| Compose | n | , | n cedilla |
| Compose | o | _ | o macron |
| Compose | r | , | r cedilla |
| Compose | t | | | t stroke |
| Compose | u | ~ | u tilde |
| Compose | u | a | u ogonek |
| Compose | u | _ | u macron |
| Compose | n | n | eng |
-
Note - Compose sequences defined in TABLE 4-3, TABLE 4-4, or TABLE 4-6 are not
- included in this table.
-
TABLE 4-6
| Press and Release | Then Press and Release | Then Press and Release | Result |
| Compose | G | u | G breve |
| Compose | I | . | I dot above |
| Compose | g | u | g breve |
| Compose | i | . | i dotless |
Cyrillic Input Mode
- To switch to Cyrillic input mode from English input mode, press Compose c c. If you are currently in Greek input mode, first return to English input mode, then switch to Cyrillic mode.
- The input mode is displayed at the bottom left corner of your GUI application, as shown FIGURE 4-2:

FIGURE 4-2
- After you switch to Cyrillic input mode, you cannot enter English text. To switch back to English input mode, type Control-Space. The Russian keyboard layout appears in FIGURE 4-3:

FIGURE 4-3
Greek Input Mode
- To switch to Greek input mode from English input mode, press Compose g g. If you are currently in Cyrillic input mode, first return to English input mode and then switch to Greek mode.
- The input mode is displayed at the left bottom corner of your GUI application is shown in FIGURE 4-4:

FIGURE 4-4
- After you switch to Greek input mode, you cannot enter English text. To switch back to English input mode, type Control-Space. The Greek keyboard layouts appear in FIGURE 4-5 and FIGURE 4-6:

FIGURE 4-5

FIGURE 4-6
Printing
- The en_US.UTF-8 locale provides a printing utility, xutops(1). This utility can print flat text files written in UTF-8 using X11 bitmap fonts available on the system. Because the output from the utility is standard PostScript, the output can be sent to any PostScript printer.
- To use the utility, type the following:
-
system% xutops filename | lp
|
- You can also use the utility as a filter since the utility accepts stdin stream:
-
system% lpr filename | xutops | lp
|
- You can also set the utility as a printing filter for a line printer. For example, the following command sequence tells the printer service LP that the printer lp1 accepts only xutops format files. This command line also installs the printer lp1 on port/dev/ttya. See the lpadmin(1M) man page for more details.
-
system# lpadmin -p lp1 -v /dev/ttya -I XUTOPS
system# accept lp1
system# enable lp1
|
- Using lpfilter(1M), you can add the utility as a filter as follows:
-
system# lpfilter -f filtername -F pathname
|
- The command tells LP that a converter (in this case, xutops) is available through the filter description file named pathname. Pathname can be as follows:
-
Input types: simple
Output types: XUTOPS
Command: /usr/openwin/bin/xutops
|
- The filter converts default type file input to PostScript output using /usr/openwin/bin/xutops.
- To print a UTF-8 text file, use the following command:
-
system% lp -T XUTOPS UTF-8-file
|
- For more details on xutops(1), refer to xutops(1) and xutops(5) man pages.
Programming Environment
- Appropriately internationalized applications should automatically enable the en_US.UTF-8 locale, but proper FontSet/XmFontList definitions in the application's resource file are required.
- For information on internationalized applications, see Creating Worldwide Software: Solaris International Developer's Guide, 2nd edition.
FontSet Used with UTF-8
- The en_US.UTF-8 locale in Solaris 2.6 supports fonts for the following charsets:
-
- ISO 8859-1
- ISO 8859-2
- ISO 8859-4
- ISO 8859-5
- ISO 8859-7
- ISO 8859-9
- Because Solaris 2.6 supports the CDE desktop environment, each charset has guaranteed sets of fonts.
- The following list shows the Latin-1 fonts that are supported in Solaris 2.6:
-
-
· -dt-interface system-medium-r-normal-xxs sans-10-100-72-72-p-59-iso8859-1
· -dt-interface system-medium-r-normal-xs sans-12-120-72-72-p-71-iso8859-1
· -dt-interface system-medium-r-normal-s sans-14-140-72-72-p-82-iso8859-1
· -dt-interface system-medium-r-normal-m sans-17-170-72-72-p-97-iso8859-1
· -dt-interface system-medium-r-normal-l sans-18-180-72-72-p-106-iso8859-1
· -dt-interface system-medium-r-normal-xl sans-20-200-72-72-p-114-iso8859-1
· -dt-interface system-medium-r-normal-xxl sans-24-240-72-72-p-137-iso8859-1
- For information on CDE common font aliases, including -dt-interface user-* and -dt-application-* aliases, see Common Desktop Environment: Internationalization Programmer's Guide.
- A fontset for an application should have a collection of fonts that contains each of the above charsets, as in the following example:
-
fs = XCreateFontSet(display,
"-dt-interface system-medium-r-normal-s*-*-*-*-*-*-*-*-iso8859-1,
-dt-interface system-medium-r-normal-s*-*-*-*-*-*-*-*-iso8859-2,
-dt-interface system-medium-r-normal-s*-*-*-*-*-*-*-*-iso8859-4,
-dt-interface system-medium-r-normal-s*-*-*-*-*-*-*-*-iso8859-5,
-dt-interface system-medium-r-normal-s*-*-*-*-*-*-*-*-iso8859-7,
-dt-interface system-medium-r-normal-s*-*-*-*-*-*-*-*-iso8859-9",
&missing_ptr, &missing_count, &def_string);
|
|
|