International Language Environments Guide
  Search only this book
View this book in:
Download this book in PDF (3243 KB)

Chapter 5 Overview of UTF-8 Locale Support

This chapter provides an overview of UTF-8 locale support. The chapter covers the following topics:

Unicode Overview

Unicode is the universal character encoding standard used for representation of text for computer processing. Unicode is fully compatible with the international standards ISO/IEC 10646-1:2000 and ISO/IEC 10646–2:2001, and contains all the same characters and encoding points as ISO/IEC 10646. The Unicode Standard provides additional information about the characters and their use. Any implementation that conforms to Unicode also conforms to ISO/IEC 10646.

Unicode provides a consistent way of encoding multilingual plain text and facilitates exchanging international text files. Computer users who deal with multilingual text, business people, linguists, researchers, scientists, and others find that the Unicode Standard greatly simplifies their work. Mathematicians and technicians who regularly use mathematical symbols and other technical characters also find the Unicode Standard valuable.

The maximum possible number of code points Unicode can support is 1,114,112 through seventeen 16-bit planes. Each plane can support 65,536 different code points.

Among the more than one million code points that Unicode can support, version 4.0 curently defines 96,382 characters at plane 0, 1, 2, and 14. Planes 15 and 16 are for private use characters, also known as user-defined characters. Planes 15 and 16 together can support total 131,068 user-defined characters.

Unicode can be encoded using any of the following character encoding schemes:

  • UTF-8

  • UTF-16

  • UTF-32

UTF-8 is a variable-length encoding form of Unicode that preserves ASCII character code values transparently. This form is used as file code in Solaris Unicode locales.

UTF-16 is a 16-bit encoding form of Unicode. In UTF-16, characters up to 65,535 are encoded as single 16-bit values. Characters mapped above 65,535 to 1,114,111 are encoded as pairs of 16-bit values (surrogates).

UTF-32 is a fixed-length, 21-bit encoding form of Unicode usually represented in a 32-bit container or data type. This form is used as the process code (wide-character code) in Solaris Unicode locales.

For more details on the Unicode Standard and ISO/IEC 10646 and their various representative forms, refer to the following sources:

  • The Unicode Standard, Version 4.0 from the Unicode Consortium

  • ISO/IEC 10646-1:2000, Information Technology-Universal Multiple-Octet Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane

  • ISO/IEC 10646-2: Information Technology-Universal Multiple-Octet Character Set (UCS) - Part 2: Secondary Multilingual Plane for Scripts and Symbols, Supplementary Plane for CJK Ideographs, Special Purpose Plane

  • The Unicode Consortium web site at http://www.unicode.org/.

Unicode Locale: en_US.UTF-8 Support

The Unicode/UTF-8 locales support Unicode 4.0. The en_US.UTF-8 locale provides multiscript processing support by using UTF-8 as its codeset. This locale handles processing of input and output text in multiple scripts, and was the first locale with this capability in the Solaris Operating System. The capabilities of other UTF-8 locales are similar to those of en_us.UTF-8. The discussion of en_US.UTF-8 that follows applies equally to these locales.


Note –

UTF-8 is a file-system safe Universal Character Set Transformation Format of Unicode/ISO/IEC 10646-1 formulated by X/Open-Uniforum Joint Internationalization Working Group (XoJIG) in 1992 and approved by ISO and IEC, as Amendment 2 to ISO/IEC 10646-1:1993 in 1996. This standard has been adopted by the Unicode Consortium, the International Standards Organization, and the International Electrotechnical Commission as a part of Unicode 4.0 and ISO/IEC 10646-1.


Unicode locales in the Solaris environment support the processing of every code point value that is defined in Unicode 4.0 and ISO/IEC 10646-1 and 10646-2. Supported scripts include pan-European and Asian scripts and also complex text layout scripts for the Arabic, Hebrew, Indic, and Thai languages.


Note –

Some Unicode locales, notably the Asian locales, include more Kanji or Hanzi glyphs.


Due to limited font resources, the current Solaris Unicode locales include character glyphs from the following character sets.

  • ISO 8859-1 (most Western European languages, such as English, French, Spanish, and German)

  • ISO 8859-2 (most Central European languages, such as Czech, Polish, and Hungarian)

  • ISO 8859-4 (Scandinavian and Baltic languages)

  • ISO 8859-5 (Russian)

  • ISO 8859-6 (Arabic, including many more presentation-form character glyphs)

  • ISO 8859–7 (Greek)

  • ISO 8859–8 (Hebrew)

  • ISO 8859-9 (Turkish)

  • TIS 620.2533 (Thai, including many more presentation-form character glyphs)

  • ISO 8859–15 (most Western European languages with euro sign)

  • GB 2312–1980 (Simplified Chinese)

  • JIS X 0201–1976, JIS X 0208–1990 (Japanese)

  • KSC 5601–1992 Annex 3 (Korean)

  • GB 18030 (Simplified Chinese)

  • HKSCS (Traditional Chinese, Hong Kong)

  • Big5 (Traditional Chinese, Taiwan)

  • IS 13194.1991, also known as ISCII (Hindi, including many more presentation-form character glyphs)

If you try to view characters for which the en_US.UTF-8 locale does not have corresponding glyphs, the locale displays a no-glyph glyph instead, as shown in the following illustration:

The preceding context describes the graphic.

The locale is selectable at installation time and may be designated as the system default locale.

The same level of en_US.UTF-8 locale support is provided for both 64-bit and 32-bit Solaris systems.


Note –

Motif and CDE desktop applications and libraries support the en_US.UTF-8 locale. However, XView™ and OLIT libraries do not support the en_US.UTF-8 locale.


About Desktop Input Methods

CDE provides the ability to enter localized input for an internationalized application using Xm Toolkit. The XmText[Field] widgets are enabled to interface with input methods from each locale. Input methods are internationalized because some language environments write their text from right-to-left, top-to-bottom, and so forth. Within the same application, you can use different input methods that apply several fonts.

The preedit area displays the string that is being pre-edited. Writing text can be done in four modes:

  • OffTheSpot

  • OverTheSpot (default)

  • Root

  • None

In OffTheSpot mode, the location is just below the main window area at the right of the status area. In OverTheSpot mode, the pre-edit area is at the cursor point. In Root mode, the preedit and status areas are separate from the client's window.

For more details, refer to the XmNpreeditType resource description in the VendorShell(3X) man page.


Note –

In the current Solaris environment, native Asian input methods exist for Simplified/Traditional Chinese, Japanese, and Korean. These methods are in addition to the current multiscript input methods for Unicode locales.


Accessing an Input Mode includes descriptions of selected input methods, how to use them, and how to switch between them.

Script Selection and Input Modes

Solaris Unicode locales support multiple scripts. Every Unicode locale has a total of fourteen input modes.

  • English/European

  • Cyrillic

  • Greek

  • Arabic

  • Hebrew

  • Thai

  • Japanese

  • Korean

  • Simplified Chinese

  • Traditional Chinese

  • Traditional Chinese (Hong Kong)

  • Indic

  • Unicode Hexadecimal and Octal code input methods

  • Table lookup input method

Accessing an Input Mode

You can switch into a particular input mode by using a Compose key combination or through the input mode selection window. To access the input mode selection window, click in the status area at the bottom left corner of your application window. The input mode selection window is shown in following figure.

Figure 5–1 Input Mode Selection Window

The preceding context describes the graphic.

Input Mode Switch Key Sequences

You can change the current input mode to a new input mode by using the key sequences listed in Table 5–1. The only restriction for using these key sequences is that if you are in one of the Asian input modes, you need to switch back to English/European input mode by pressing Control and spacebar together. Once you are in the English/European input mode, you can switch freely to any other input mode by using the key sequences.

The following key sequences show how to switch to Cyrillic from the English/European input mode:

  1. Press the Compose key.

  2. Press and release the C key.

  3. Press the C key.

Table 5–1 Input Mode Switch Key Sequences

Key Sequences 

Input Mode 

Control-spacebar 

English/European 

Compose c c 

Cyrillic 

Compose g g 

Greek 

Compose a r 

Arabic 

Compose h h 

Hebrew 

Compose t t 

Thai 

Compose h i 

Indic 

Compose i n 

Indic 

Compose j a 

Japanese 

Compose k o 

Korean 

Compose s c 

Simplified Chinese 

Compose t c 

Traditional Chinese 

Compose h k 

Traditional Chinese (Hong Kong) 

Compose u o 

Unicode octal code input method 

Compose u h 

Unicode hexadecimal code input method 

Compose l l 

Table lookup input method 

English/European Input Mode

The English/European input mode includes the English alphabet plus characters with diacritical marks (for example, á, è, î, õ, and ü) and characters (such as ¡, §, ¿) from European scripts.

This input mode is the default mode for any application. The input mode is displayed at the bottom left corner of the GUI application window.

To insert characters with diacritical marks or special characters from Latin-1, Latin-2, Latin-4, Latin-5, and Latin-9, you must type a Compose key sequence, as described in the following examples.

To display the Ä character:

  1. Press and release the Compose key.

  2. Press Shift and the A key simultaneously. Release Shift-A.

  3. Press and release the ” key.

To display the ¿, character:

  1. Press and release the Compose key.

  2. Press and release the ? key.

  3. Press and release the ? key.

When there is no Compose key available on your keyboard, you can emulate its operation by simultaneously pressing the Control key and the Shift key.

For the input of the Euro currency symbol (Unicode value U+20AC) from the locale, you can use any one of following input sequences:

  • AltGraph and E together

  • AltGraph and 4 together

  • AltGraph and 5 together

With these input sequences, you press both keys simultaneously. If no AltGraph key is available on your keyboard, you can use certain alternative euro sign input sequences such as Compose e = or Compose c =.

The following tables show the most commonly used compose sequences for Latin-1, Latin-2, Latin-3, Latin-4, Latin-5, and Latin-9 script input for the Solaris Operating System.

The following table lists the common Latin-1 Compose key sequences.

Table 5–2 Common Latin-1 Compose Key Sequences

Press Compose, then Press and Release 

Then Press and Release 

Result 

spacebar 

spacebar 

no-break space 

superscripted 1 

2

superscripted 2 

3

superscripted 3 

!

inverted exclamation mark 

o

currency symbol ¤ 

!

paragraph symbol ¶ 

u

mu u 

"

acute accent ´ 

, (comma)

cedilla Ç 

"

diaeresis ¨ 

^

macron ¯ 

o

degree ° 

x

multiplication sign x 

-

plus-minus ± 

-

soft hyphen – 

:

division sign ÷ 

ordinal (feminine) ª 

ordinal (masculine) º 

, (comma) 

not sign ¬ 

middle dot · 

vulgar fraction ½ 

vulgar fraction ¼ 

vulgar fraction ¾ 

left double angle quotation mark « 

right double angle quotation mark » 

inverted question mark ¿ 

` (backquote) 

A grave À 

' (single quote) 

A acute Á 

A ring above Å 

A diaeresis Ä 

A circumflex  

A tilde à

AE diphthong Æ 

, (comma) 

C cedilla Ç 

copyright sign © 

Capital eth ð 

` (backquote) 

E grave È 

E acute É 

E diaeresis Ë 

E circumflex Ê 

` (backquote) 

I grave Ì 

'

I acute Í 

"

I diaeresis Ï 

^

I circumflex Π

-

pound sign £ 

~

N tilde Ñ 

` (backquote)

O grave Ò 

'

O acute Ó 

/

O slash Ø 

"

O diaeresis Ö 

^

O circumflex Ô 

~

O tilde Õ 

O

registered mark ® 

H

Thorn þ 

` (backquote)

U grave Ù 

U acute Ú 

U diaeresis Ü 

U circumflex Û 

Y acute ý 

yen sign ¥ 

` (backquote) 

a grave à 

a acute á 

a ring above å 

a diaeresis ä 

a tilde ã 

a circumflex â 

ae diphthong æ 

, (comma) 

c cedilla ç 

cent sign ¢ 

copyright sign © 

eth ð 

` (backquote) 

e grave è 

e acute é 

e diaeresis ë 

e circumflex ê 

` (backquote) 

i grave ì 

i acute í 

i diaeresis ï 

i circumflex î 

n tilde ñ 

` (backquote) 

o grave ò 

o acute ó 

o slash ø 

o diaeresis ö 

o circumflex ô 

o tilde õ 

German double s ß also known as sharp S 

thorn þ 

` (backquote) 

u grave ù 

u acute ú 

u diaeresis ü 

u circumflex û 

y acute y 

y diaeresis ÿ 

broken bar ¦ 

The following table lists the common Latin-2 Compose key sequences.

Table 5–3 Common Latin-2 Compose Key Sequences

Press Compose, then Press and Release 

Press and Release 

Result 

k

k

kra 

A

_

A macron 

E

_

E macron 

E

.

E dot above 

G

,

G cedilla 

I

_

I macron 

I

~

I tilde 

I

a

I ogonek 

K

,

K cedilla 

L

,

L cedilla 

N

,

N cedilla 

O

_

O macron 

R

,

R cedilla 

T

|

T stroke 

U

~

U tilde 

U

a

U ogonek 

U

_

U macron 

N

N

Eng 

a

_

a macron 

e

_

e macron 

e

.

e dot above 

g

,

g cedilla 

i

_

i macron 

i

~

i tilde 

i

a

i ogonek 

k

,

k cedilla 

l

,

l cedilla 

n

,

n cedilla 

o

_

o macron 

r

,

r cedilla 

t

|

t stroke 

u

~

u tilde 

u

a

u ogonek 

u

_

u macron 

n

n

eng 

   

The following table lists the common Latin-3 Compose key sequences.

Table 5–4 Common Latin-3 Compose Key Sequences

Press Compose, then Press and Release 

Press and Release 

Result 

C

>

C circumflex 

C

.

C dot above 

G

>

G circumflex 

G

.

G dot above 

H

>

H circumflex 

J

>

j circumflex 

S

>

S circumflex 

U

u

U breve 

c

>

c circumflex 

c

.

c dot above 

g

>

g circumflex 

g

.

g dot above 

h

>

h circumflex 

j

>

j circumflex 

s

>

s circumflex 

u

u

u breve 

The following table lists the common Latin-4 Compose key sequences.

Table 5–5 Common Latin-4 Compose Key Sequences

Press Compose, then Press and Release 

Press and Release 

Result 

kra 

A macron 

E macron 

E dot above 

G cedilla 

I macron 

I tilde 

I ogonek 

K cedilla 

L cedilla 

N cedilla 

O macron 

R cedilla 

T stroke 

U tilde 

U ogonek 

U macron 

Eng 

a macron 

e macron 

e dot above 

g cedilla 

i macron 

i tilde 

i ogonek 

k cedilla 

l cedilla 

n cedilla 

o macron 

r cedilla 

t stroke 

u tilde 

u ogonek 

u macron 

eng 

   

The following table lists the common Latin-5 Compose key sequences.

Table 5–6 Common Latin-5 Compose Key Sequences

Press Compose, then Press and Release 

Press and Release 

Result 

G breve 

I dot above 

g breve 

i dotless 

The following table lists the Common Latin-9 Compose key sequences.

Table 5–7 Common Latin-9 Compose Key Sequences

Press Compose, then Press and Release 

Press and Release 

Result 

Ligature oe 

Ligature OE 

“ 

Y diaeresis 

If you are using a keyboard that has accent dead keys, use the following compose key sequences. The “dead_acute” and such key names come from the X11 registered keysym names of X_dead_acute and so on as shown at /usr/openwin/include/X11/keysymdef.h. The SunFA_Circum and such key names come from Sun-defined X11 keysym names such as SunXK_FA_Circum shown at /usr/openwin/include/X11/Sunkeysym.h.

Table 5–8 Compose Key Sequences Based on Accent Dead Keys

Press and Release 

Press and Release 

Result 

dead_grave 

spacebar 

grave accent 

dead_acute 

apostrophe 

acute accent 

dead_acute 

spacebar 

apostrophe 

dead_diaeresis 

double quote 

diaeresis 

dead_diaeresis 

spacebar 

diaeresis 

dead_circumflex 

spacebar 

circumflex accent 

dead_circumflex 

slash 

vertical line 

dead_circumflex 

degree sign 

dead_circumflex 

superscript one 

dead_circumflex 

superscript two 

dead_circumflex 

superscript three 

dead_circumflex 

period 

middle dot 

dead_circumflex 

exclamation point 

broken bar 

dead_circumflex 

minus 

macron 

dead_circumflex 

underscore 

macron 

dead_cedilla 

comma 

cedilla 

dead_cedilla 

minus 

not sign 

dead_tilde 

spacebar 

tilde 

dead_grave 

A with grave 

dead_acute 

A with acute 

dead_circumflex 

A with circumflex 

dead_tilde 

A with tilde 

dead_diaeresis 

A with diaeresis 

dead_grave 

a with grave 

dead_acute 

a with acute 

dead_circumflex 

a with circumflex 

dead_tilde 

a with tilde 

dead_diaeresis 

a with diaeresis 

dead_cedilla 

C with cedilla 

dead_cedilla 

c with cedilla 

dead_grave 

E with grave 

dead_acute 

E with acute 

dead_circumflex 

E with circumflex 

dead_diaeresis 

E with diaeresis 

dead_grave 

e with grave 

dead_acute 

e with acute 

dead_circumflex 

e with circumflex 

dead_diaeresis 

e with diaeresis 

dead_grave 

I with grave 

dead_acute 

I with acute 

dead_circumflex  

I with circumflex 

dead_diaeresis 

I with diaeresis 

dead_grave 

i with grave 

dead_acute 

i with acute 

dead_circumflex 

i with circumflex 

dead_diaeresis 

i with diaeresis 

dead_tilde 

N with tilde 

dead_tilde 

n with tilde 

dead_grave 

O with grave 

dead_acute 

O with acute 

dead_circumflex 

O with circumflex 

dead_tilde 

O with tilde 

dead_diaeresis 

O with diaeresis 

dead_grave 

o with grave 

dead_acute 

o with acute 

dead_circumflex 

o with circumflex 

dead_tilde 

o with tilde 

dead_diaeresis 

o with diaeresis 

dead_cedilla 

S with cedilla 

dead_cedilla 

s with cedilla 

dead_grave 

U with grave 

dead_acute 

U with acute 

dead_circumflex 

U with circumflex 

dead_diaeresis 

U with diaeresis 

dead_grave 

u with grave 

dead_acute 

u with acute 

dead_circumflex 

u with circumflex 

dead_diaeresis 

u with diaeresis 

dead_acute 

Y with acute 

dead_acute 

y with acute 

dead_diaeresis 

y with diaeresis 

SunFA_Grave 

spacebar 

grave accent 

SunFA_Grave 

A with grave 

SunFA_Grave 

a with grave 

SunFA_Grave 

E with grave 

SunFA_Grave 

e with grave 

SunFA_Grave 

I with grave 

SunFA_Grave 

i with grave 

SunFA_Grave 

O with grave 

SunFA_Grave 

o with grave 

SunFA_Grave 

U with grave 

SunFA_Grave 

u with grave 

SunFA_Acute 

apostrophe 

acute accent 

SunFA_Acute 

spacebar 

apostrophe 

SunFA_Acute 

A with acute 

SunFA_Acute 

a with acute 

SunFA_Acute 

C with acute 

SunFA_Acute 

c with acute 

SunFA_Acute 

E with acute 

SunFA_Acute 

e with acute 

SunFA_Acute 

I with acute 

SunFA_Acute 

i with acute 

SunFA_Acute 

L with acute 

SunFA_Acute 

l with acute 

SunFA_Acute 

N with acute 

SunFA_Acute 

n with acute 

SunFA_Acute 

O with acute 

SunFA_Acute 

o with acute 

SunFA_Acute 

R with acute 

SunFA_Acute 

r with acute 

SunFA_Acute 

S with acute 

SunFA_Acute 

s with acute 

SunFA_Acute 

U with acute 

SunFA_Acute 

u with acute 

SunFA_Acute 

Y with acute 

SunFA_Acute 

y with acute 

SunFA_Acute 

Z with acute 

SunFA_Acute 

z with acute 

SunFA_Cedilla 

comma 

cedilla 

SunFA_Cedilla 

minus 

not sign 

SunFA_Cedilla 

C with cedilla 

SunFA_Cedilla 

c with cedilla 

SunFA_Cedilla 

G with cedilla 

SunFA_Cedilla 

g with cedilla 

SunFA_Cedilla 

K with cedilla 

SunFA_Cedilla 

k with cedilla 

SunFA_Cedilla 

L with cedilla 

SunFA_Cedilla 

l with cedilla 

SunFA_Cedilla 

N with cedilla 

SunFA_Cedilla 

n with cedilla 

SunFA_Cedilla 

R with cedilla 

SunFA_Cedilla 

r with cedilla 

SunFA_Cedilla 

S with cedilla 

SunFA_Cedilla 

s with cedilla 

SunFA_Cedilla 

T with cedilla 

SunFA_Cedilla 

t with cedilla 

SunFA_Circum 

spacebar 

circumflex accent 

SunFA_Circum 

degree sign 

SunFA_Circum 

superscript one 

SunFA_Circum 

superscript two 

SunFA_Circum 

superscript three 

SunFA_Circum 

exclamation point 

broken bar 

SunFA_Circum 

minus 

macron 

SunFA_Circum 

underscore 

macron 

SunFA_Circum 

period 

middle dot 

SunFA_Circum 

slash 

vertical line 

SunFA_Circum 

A with circumflex 

SunFA_Circum 

a with circumflex 

SunFA_Circum 

C with circumflex 

SunFA_Circum 

c with circumflex 

SunFA_Circum 

E with circumflex 

SunFA_Circum 

e with circumflex 

SunFA_Circum 

G with circumflex 

SunFA_Circum 

g with circumflex 

SunFA_Circum 

H with circumflex 

SunFA_Circum 

h with circumflex 

SunFA_Circum 

I with circumflex 

SunFA_Circum 

i with circumflex 

SunFA_Circum 

J with circumflex 

SunFA_Circum 

j with circumflex 

SunFA_Circum 

O with circumflex 

SunFA_Circum 

o with circumflex 

SunFA_Circum 

S with circumflex 

SunFA_Circum 

s with circumflex 

SunFA_Circum 

U with circumflex 

SunFA_Circum 

u with circumflex 

SunFA_Diaeresis 

double quote 

diaeresis 

SunFA_Diaeresis 

spacebar 

diaeresis 

SunFA_Diaeresis 

A with diaeresis 

SunFA_Diaeresis 

a with diaeresis 

SunFA_Diaeresis 

E with diaeresis 

SunFA_Diaeresis 

e with diaeresis 

SunFA_Diaeresis 

I with diaeresis 

SunFA_Diaeresis 

i with diaeresis 

SunFA_Diaeresis 

O with diaeresis 

SunFA_Diaeresis 

o with diaeresis 

SunFA_Diaeresis 

U with diaeresis 

SunFA_Diaeresis 

u with diaeresis 

SunFA_Diaeresis 

y with diaeresis 

SunFA_Diaeresis 

Y with diaeresis 

SunFA_Tilde 

spacebar 

tilde 

SunFA_Tilde 

A with tilde 

SunFA_Tilde 

a with tilde 

SunFA_Tilde 

N with tilde 

SunFA_Tilde 

n with tilde 

SunFA_Tilde 

O with tilde 

SunFA_Tilde 

o with tilde 

Arabic Input Mode

To switch to Arabic input mode, either press Compose a r, or select Arabic from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

The following figure shows the Arabic keyboard layout.

Figure 5–2 Arabic Keyboard

The preceding context describes the graphic.

Cyrillic Input Mode

To switch to Cyrillic input mode, either press Compose c c, or select Cyrillic from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

The Cyrillic (Russian) keyboard layout appears in the following figure.

Figure 5–3 Cyrillic (Russian) Keyboard

The preceding context describes the graphic.

After you switch to Cyrillic input mode, you cannot enter English or European text. To switch back to the English/European input mode, type Control—spacebar together or select English/European input mode from the Input Mode Selection Window by clicking in the status area. See Accessing an Input Mode.

You can also switch into other input modes by typing the corresponding input mode switch key sequence.

Greek Input Mode

To switch to Greek input mode, either press Compose g g, or select Greek from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

After you switch to Greek input mode, you cannot enter English or European text. To switch back to the English/European input mode, either press Control and spacebar together, or select English/European input mode from the input mode selection window by clicking in the status area. The Greek Euro keyboard layout appears in the following figure.

Figure 5–4 Greek Euro Keyboard

The preceding context describes the graphic.

The following figure shows the Greek UNIX keyboard.

Figure 5–5 Greek UNIX Keyboard

The preceding context describes the graphic.

The following compose key sequences are supported in the Greek input mode. Some compose key sequences start with accent dead keys. The abbreviation “ordfemenine” stands for feminine ordinal indicator key.

Table 5–9 Compose Key Sequences at Greek Input Mode

Press and Release 

Press and Release 

Result 

semicolon 

lowercase Greek_alpha with tonos 

semicolon 

lowercase Greek_epsilon with tonos 

semicolon 

lowercase Greek_eta with tonos 

semicolon 

lowercase Greek_iota with tonos 

semicolon 

lowercase Greek_omicron with tonos 

semicolon 

lowercase Greek_upsilon with tonos 

semicolon 

lowercase Greek_omega with tonos 

semicolon 

uppercase Greek_alpha with tonos 

semicolon 

uppercase Greek_epsilon with tonos 

semicolon 

uppercase Greek_eta with tonos 

semicolon 

uppercase Greek_iota with tonos 

semicolon 

uppercase Greek_omicron with tonos 

semicolon 

uppercase Greek_upsilon with tonos 

semicolon 

uppercase Greek_omega with tonos 

dead_acute 

Greek_alpha 

lowercase Greek_alpha with tonos 

dead_acute 

Greek_epsilon 

lowercase Greek_epsilon with tonos 

dead_acute 

Greek_eta 

lowercase Greek_eta with tonos 

dead_acute 

Greek_iota 

lowercase Greek_iota with tonos 

dead_acute 

Greek_omicron 

lowercase Greek_omicron with tonos 

dead_acute 

Greek_upsilon 

lowercase Greek_upsilon with tonos 

dead_acute 

Greek_omega 

lowercase Greek_omega with tonos 

dead_acute 

Greek_ALPHA 

uppercase Greek_alpha with tonos 

dead_acute 

Greek_EPSILON 

uppercase Greek_epsilon with tonos 

dead_acute 

Greek_ETA 

uppercase Greek_eta with tonos 

dead_acute 

Greek_IOTA 

uppercase Greek_iota with tonos 

dead_acute 

Greek_OMICRON 

uppercase Greek_omicron with tonos 

dead_acute 

Greek_UPSILON 

uppercase Greek_upsilon with tonos 

dead_acute 

Greek_OMEGA 

uppercase Greek_omega with tonos 

dead_acute 

lowercase Greek_alpha with tonos 

dead_acute 

lowercase Greek_epsilon with tonos 

dead_acute 

lowercase Greek_eta with tonos 

dead_acute 

lowercase Greek_iota with tonos 

dead_acute 

lowercase Greek_omicron with tonos 

dead_acute 

lowercase Greek_upsilon with tonos 

dead_acute 

lowercase Greek_omega with tonos 

dead_acute 

uppercase Greek_alpha with tonos 

dead_acute 

uppercase Greek_epsilon with tonos 

dead_acute 

uppercase Greek_eta with tonos 

dead_acute 

uppercase Greek_iota with tonos 

dead_acute 

uppercase Greek_omicron with tonos 

dead_acute 

uppercase Greek_upsilon with tonos 

dead_acute 

uppercase Greek_omega with tonos 

colon 

lowercase Greek_iota with dialytika 

colon 

lowercase Greek_upsilon with dialytika 

colon 

uppercase Greek_iota with dialytika 

colon 

uppercase Greek_upsilon with dialytika 

dead_diaeresis 

lowercase Greek_iota with dialytika 

dead_diaeresis 

lowercase Greek_upsilon with dialytika 

dead_diaeresis 

uppercase Greek_iota with dialytika 

dead_diaeresis 

uppercase Greek_upsilon with dialytika 

dead_diaeresis 

Greek_iota 

lowercase Greek_iota with dialytika 

dead_diaeresis 

Greek_upsilon 

lowercase Greek_upsilon with dialytika 

dead_diaeresis 

Greek_IOTA 

uppercase Greek_iota with dialytika 

dead_diaeresis 

Greek_UPSILON 

uppercase Greek_upsilon with dialytika 

semicolon 

semicolon 

Greek tonos 

colon 

colon 

diaeresis/dialytika 

ordfeminine 

plus-minus sign 

ordfeminine 

section sign 

ordfeminine 

superscript two 

ordfeminine 

superscript three 

ordfeminine 

broken bar 

ordfeminine 

copyright sign 

ordfeminine 

not sign 

ordfeminine 

soft hyphen 

ordfeminine 

degree sign 

ordfeminine 

hyphen 

vulgar fraction one half 

ordfeminine 

backslash 

pound sign 

ordfeminine 

braceleft 

modifier letter reversed comma 

ordfeminine 

braceright 

modifier letter apostrophe 

ordfeminine 

bracketleft 

left-pointing double angle quotation mark 

ordfeminine 

bracketright 

right-pointing double angle quotation mark 

SunFA_Acute 

lowercase Greek_alpha with tonos 

SunFA_Acute 

lowercase Greek_epsilon with tonos 

SunFA_Acute 

lowercase Greek_eta with tonos 

SunFA_Acute 

lowercase Greek_iota with tonos 

SunFA_Acute 

lowercase Greek_omicron with tonos 

SunFA_Acute 

lowercase Greek_upsilon with tonos 

SunFA_Acute 

Greek_omega with tonos 

SunFA_Acute 

uppercase Greek_alpha with tonos 

SunFA_Acute 

uppercase Greek_epsilon with tonos 

SunFA_Acute 

uppercase Greek_eta with tonos 

SunFA_Acute 

uppercase Greek_omicron with tonos 

SunFA_Acute 

uppercase Greek_iota with tonos 

SunFA_Acute 

uppercase Greek_upsilon with tonos 

SunFA_Acute 

uppercase Greek_omega with tonos 

SunFA_Acute 

Greek_alpha 

lowercase Greek_alpha with tonos 

SunFA_Acute 

Greek_epsilon 

lowercase Greek_epsilon with tonos 

SunFA_Acute 

Greek_eta 

lowercase Greek_eta with tonos 

SunFA_Acute 

Greek_iota 

lowercase Greek_iota with tonos 

SunFA_Acute 

Greek_omega 

lowercase Greek_omega with tonos 

SunFA_Acute 

Greek_omicron 

lowercase Greek_omicron with tonos 

SunFA_Acute 

Greek_upsilon 

lowercase Greek_upsilon with tonos 

SunFA_Acute 

Greek_ALPHA 

uppercase Greek_alpha with tonos 

SunFA_Acute 

Greek_EPSILON 

uppercase Greek_epsilon with tonos 

SunFA_Acute 

Greek_ETA 

uppercase Greek_eta with tonos 

SunFA_Acute 

Greek_IOTA 

uppercase Greek_iota with tonos 

SunFA_Acute 

Greek_OMICRON 

uppercase Greek_omicron with tonos 

SunFA_Acute 

Greek_UPSILON 

uppercase Greek_upsilon with tonos 

SunFA_Acute 

Greek_OMEGA 

uppercase Greek_omega with tonos 

SunFA_Diaeresis 

lowercase Greek_iota with dialytika 

SunFA_Diaeresis 

lowercase Greek_upsilon with dialytika 

SunFA_Diaeresis 

uppercase Greek_iota with dialytika 

SunFA_Diaeresis 

uppercase Greek_upsilon with dialytika 

SunFA_Diaeresis 

Greek_iota 

lowercase Greek_iota with dialytika 

SunFA_Diaeresis 

Greek_upsilon 

lowercase Greek_upsilon with dialytika 

SunFA_Diaeresis 

Greek_IOTA 

uppercase Greek_iota with dialytika 

SunFA_Diaeresis 

Greek_UPSILON 

uppercase Greek_upsilon with dialytika 

Table 5–10 Compose Key Sequences at Greek Input Mode with Three Keys

Press and Release 

Press and Release 

Press and Release 

Result 

semicolon 

colon 

lowercase Greek_upsilon with dialytika and tonos 

colon 

semicolon 

lowercase Greek_upsilon with dialytika and tonos 

semicolon 

colon 

lowercase Greek_iota with dialytika and tonos 

colon 

semicolon 

lowercase Greek_iota with dialytika and tonos 

dead_acute 

dead_diaeresis 

lowercase Greek_upsilon with dialytika and tonos 

dead_diaeresis 

dead_acute 

lowercase Greek_upsilon with dialytika and tonos 

dead_acute 

dead_diaeresis 

lowercase Greek_iota with dialytika and tonos 

dead_diaeresis 

dead_acute 

lowercase Greek_iota with dialytika and tonos 

dead_acute 

dead_diaeresis 

Greek_upsilon 

lowercase Greek_upsilon with dialytika and tonos 

dead_diaeresis 

dead_acute 

Greek_upsilon 

lowercase Greek_upsilon with dialytika and tonos 

dead_acute 

dead_diaeresis 

Greek_iota 

lowercase Greek_iota with dialytika and tonos 

dead_diaeresis 

dead_acute 

Greek_iota 

lowercase Greek_iota with dialytika and tonos 

SunFA_Acute 

SunFA_Diaeresis 

lowercase Greek_iota with dialytika and tonos 

SunFA_Diaeresis 

SunFA_Acute 

lowercase Greek_iota with dialytika and tonos 

SunFA_Acute 

SunFA_Diaeresis 

lowercase Greek_upsilon with dialytika and tonos 

SunFA_Diaeresis 

SunFA_Acute 

lowercase Greek_upsilon with dialytika and tonos 

SunFA_Acute 

SunFA_Diaeresis 

Greek_iota 

lowercase Greek_iota with dialytika and tonos 

SunFA_Diaeresis 

SunFA_Acute 

Greek_iota 

lowercase Greek_iota with dialytika and tonos 

SunFA_Acute 

SunFA_Diaeresis 

Greek_upsilon 

lowercase Greek_upsilon with dialytika and tonos 

SunFA_Diaeresis 

SunFA_Acute 

Greek_upsilon 

lowercase Greek_upsilon with dialytika and tonos 

Table 5–11 Compose Key Sequences at Greek Input Mode with Four Keys

Press and Release 

Press and Release 

Press and Release 

Press and Release 

Result 

semicolon 

colon  

colon 

semicolon  

semicolon 

colon  

colon 

semicolon  

Greek dialytika tonos 

Greek dialytika tonos  

Hebrew Input Mode

To switch into Hebrew input mode, either press Compose h h, or select Hebrew from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

The following figure shows the Hebrew keyboard layout.

Figure 5–6 Hebrew Keyboard

The preceding context describes the graphic.

Japanese Input Mode

To switch to the Japanese input mode, either press Compose j a or select Japanese from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

To use the native Japanese input system, you need to install one or more of the Japanese locales and reboot the system. After you install the Japanese locale, you can use ATOK12 in all UTF-8 locales. Wnn6 is not available in UTF-8 locales except ja_JP.UTF-8.

Figure 5–7 Japanese Keyboard

The preceding context describes the graphic.

Korean Input Mode

To switch to Korean input mode, either press Compose k o, or select Korean from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

To use the native Korean input system, you need to install one or more Korean locales on your system. For more details on how to use the Korean input System, refer to Korean Solaris User's Guide.

Figure 5–8 Korean Keyboard

The preceding context describes the graphic.

Simplified Chinese Input Mode

To switch to Simplified Chinese input mode, either press Compose s c, or select S-Chinese from the input mode selection window.Accessing an Input Mode.

To use the native Simplified Chinese input system, you need to install one or more Simplified Chinese locales on your system. For more details on how to use the Simplified Chinese input system, refer to Simplified Chinese Solaris User's Guide.

Traditional Chinese Input Mode

To switch to Traditional Chinese input mode, either press Compose t c, or select T-Chinese from the input mode selection window. For information on accessing the input mode selection window, see .Accessing an Input Mode.

To have access to the native Traditional Chinese input system, you need to install one or more Traditional Chinese locales on your system. For more details on how to use the Traditional Chinese input system, refer to the Traditional Chinese Solaris User's Guide.

Traditional Chinese (Hong Kong) Input Mode

To switch to Traditional Chinese input mode, either press Compose h k, or select T-Chinese (Hong Kong) from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

To have access to the native Traditional Chinese (Hong Kong) input system, you need to install one or more Traditional Chinese (Hong Kong) locales on your system.

Unicode Hexadecimal Input Mode

To switch to Unicode hexadecimal code input mode, press Compose u h, or select Unicode Hex from the input mode selection window. To switch to the octal number system, press Compose u o or select Unicode Octal. For information on accessing the input mode selection window, see Accessing an Input Mode.

To use these input modes, you need to know either the hexadecimal or the octal code point values of the characters. Refer to The Unicode Standard, Version 4.0 for the mapping between code point values and characters.

If you are in the Unicode hexadecimal code input mode, to input a character you would type four hexadecimal digits. Some sample hexadecimal values are:

  • 00A1 for Inverted Exclamation Mark

  • 03B2 for Greek Small Letter Beta

  • AC00 for a Korean Hangul Syllable

  • 30A1 for Japanese Katakana Letter A

  • 4E58 for a Unified Han character

You can use both uppercase and lowercase letters of A, B, C, D, E, and F for hexadecimal digits. If you prefer the octal number system instead of hexadecimal numbers, you can input octal digits 0 to 7. If you mistype a digit or two, you can delete the digits by using the Delete or Backspace key.

Table Lookup Input Mode

To switch to table lookup input mode, either press Compose l l, or select Lookup from the input mode selection window. For information on accessing the input mode selection window, see Accessing an Input Mode.

The second lookup window shows candidates for the group-only display, showing a maximum of 80 candidates at a time. Press Control n for the next set of candidates or Control p for previous set of candidates.

System Environment

This section describes locale environment variables, TTY environment setup, 32–bit and 64–bit STREAMS modules, and terminal support.

Locale Environment Variable

Be sure you have the en_US.UTF-8 locale installed on your system. To check current locale settings in various categories, use the locale utility.

system% locale 
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=

To use the en_US.UTF-8 locale desktop environment, choose the locale first. In a TTY environment, choose the locale first by setting the LANG environment variable to en_US.UTF-8, as in the following C-shell example:

system% setenv LANG en_US.UTF-8

Make sure that the LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, LC_NUMERIC, LC_MONETARY, and LC_TIME categories are not set, or are set to en_US.UTF-8. If any of these categories is set, they override the lower-priority LANG environment variable. See the setlocale(3C) man page for more details about the hierarchy of environment variables.

You can also start the en_US.UTF-8 environment from the CDE desktop. At the CDE login screen's Options -> Language menu, choose en_US.UTF-8.

TTY Environment Setup

Depending on the terminal and terminal emulator that you are using, you might need to push certain code set-specific STREAMS modules onto your streams.

For more information on STREAMS modules and streams in general, see the STREAMS Programming Guide.

The following table lists the 64–bit STREAMS modules supported by the en_US.UTF-8 locale in the terminal environment. For more details, see the Solaris 64–bit Developer's Guide.

Table 5–12 STREAMS Modules Supported by en_US.UTF-8

32-bit STREAMS module 

Description 

/usr/kernel/strmod/sparcv9/u8lat1

Code conversion STREAMS module between UTF-8 and ISO8859-1 (Western European)

/usr/kernel/strmod/sparcv9/u8lat2

Code conversion STREAMS module between UTF-8 and ISO8859-2 (Eastern European)

/usr/kernel/strmod/sparcv9/u8koi8

Code conversion STREAMS module between UTF-8 and KOI8-R (Cyrillic)


Note –

Starting with the Solaris 10 release, the 32-bit kernel is no longer supported for the SPARC sun4u platform. Table 5–12 applies only to the 32-bit kernel for the x86 platform. For more details, refer to the Release Notes.


The following table lists the 64–bit STREAMS modules supported by en_US.UTF-8.

Table 5–13 64–bit STREAMS Modules Supported by en_US.UTF-8

64-bit STREAMS Module 

Description 

/usr/kernel/strmod/sparcv9/u8lat1

Code conversions STREAMS module between UTF-8 and ISO8859-1 (Western European)

/usr/kernel/strmod/sparcv9/u8lat2

Code conversions STREAMS module between UTF-8 and ISO8859-2 (Eastern European)

/usr/kernel/strmod/sparcv9/u8koi8

Code conversions STREAMS module between UTF-8 and KOI8-R (Cyrillic)

How to Load a STREAMS Kernel Module
  1. As the root user, determine whether you are running a 64-bit Solaris or 32-bit Solaris system.

    system# isainfo -v
    
    • A 64–bit Solaris system returns the following information:

      64-bit sparcv9 applications
      32-bit sparc applications
    • A 32–bit Solaris system returns the following information:

      32-bit sparc applications
    • A 32–bit x86 system returns the following information:

      32-bit i386 applications
  2. Determine whether your system has already loaded the STREAMS module.

    system# modinfo | grep modulename
    

    If the STREAMS module, such as u8lat1, is already installed, the output looks as follows:

    system# modinfo | grep u8lat1
    89 ff798000  4b13  18   1  u8lat1 (UTF-8 <--> ISO 8859-1 module)
  3. If the module has not already been loaded, load it using themodload(1M) command.

    • On a 32–bit system, you would type:

      system# modload /usr/kernel/strmod/u8lat1
      
    • On a 64–bit system, you would type:

      system# modload /usr/kernel/strmod/sparcv9/u8lat1
      

      The appropriate u8lat1 STREAMS module is loaded in the kernel. You can now push it onto a stream.

How to Unload a STREAMS Kernel Module
  1. As root, verify that the kernel module is loaded.

    For example, to verify the u8lat1 is loaded, you would type:

    system# modinfo | grep u8lat1
    89 ff798000  4b13  18   1  u8lat1 (UTF-8 <--> ISO 8859-1 module)
  2. Use the modunload(1M) command to unload the kernel.

    For example, to unload the u8lat1 module, you would type:

    system# modunload -i 89
    
How to Setup a Latin-2 Terminal and STREAMS Module
  1. Use the strchg(1M), as shown in the second command line

    system% cat > tmp/mystreams 
    ttcompat
    ldterm
    u8lat1
    ptem
    ^D
    system% strchg -f /tmp/mystreams
    

    Be sure that you are either root or the owner of the device when you use strchg(1).

  2. Run the strconf command to examine the current configuration.

    system% strconf
    ttcompat
    ldterm
    u8lat1
    ptem
    pts
    system%
  3. Run the strchg command to reset the original configuration.

    system% cat > /tmp/orgstreams
    ttcompat
    ldterm
    ptem
    ^D
    system% strchg -f /tmp/orgstreams
    

dtterm, xterm and Terminals Capable of Input and Output of UTF-8 Characters

Unlike the older releases of the Solaris Operating System, the dtterm and xterm terminal emulators and any other terminals that support input and output of the UTF-8 code set, do not need to have any additional STREAMS modules in their streams. The ldterm module is now codeset independent and supports Unicode/UTF-8 if you set up the terminal environment with the stty(1) utility.

To set up the proper terminal environment for the Unicode locales, use the stty(1) utility.

system% /bin/stty defeucw

To query the current settings, use the -a option of the stty utility, as shown below:

system% /bin/stty -a

Note –

Because /usr/ucb/stty is not internationalized, use /bin/stty instead.


Terminal Support for Latin-1, Latin-2, or KOI8-R

For terminals that support only Latin-1 (ISO8859-1), Latin-2 (ISO8859-2), or KOI8-R, you should have the following STREAMS configuration:

head <-> ttcompat <->  ldterm <->  u8lat1 <-> TTY

This configuration is only for terminals that support Latin-1. For Latin-2 terminals, replace the STREAMS module u8lat1 with u8lat2. For KOI8-R terminals, replace the module with u8koi8.

Make sure you already have the STREAMS module loaded into the kernel.

Saving the Settings in ~/.cshrc

Assuming the necessary STREAMS modules are already loaded with the kernel, you can save the following lines in your .cshrc file (C shell example) for convenience:

setenv LANG en_US.UTF-8
if ($?USER != 0 && $?prompt != 0) then
     cat >! /tmp/mystreams$$ << _EOF
     ttcompat
     ldtterm
     u8lat1
     ptem
_EOF
     /bin/strchg -f /tmp/mystreams$$
     /bin/rm -f /tmp/mystreams$$
     /bin/stty cs8 -istrip defeucw
endif

With these lines in your.cshrc file, you do not have to type all of the commands each time you use the STREAMS module. Note that the second _EOF should start from the first column of the file.

Code Conversions

Unicode locale support adds various code conversions among major code sets of many countries through iconv and sdtconvtool utilities.

In the current Solaris environment, the utility geniconvtbl enables user-defined code conversions. The user-defined code conversions created with the geniconvtbl utility can be used with both iconv(1) and iconv(3). For more detail on this utility, refer to the geniconvtbl(1) and geniconvtbl(4) man pages.

The available fromcode and tocode names that can be applied to iconv, iconv_open, and sdtconvtool are listed in the tables in Appendix A, iconv Code Conversions. For more details on iconv code conversion, see the iconv(1), and sdtconvtool(1) man pages. For more information on available code conversions, see the iconv(5) man page. Also see Appendix A, iconv Code Conversions.


Note –

UCS-2, UCS-4, UTF-16 and UTF-32 are all Unicode/ ISO/IEC 10646 representation forms that recognize Byte Order Mark (BOM) characters defined in the Unicode 4.0 and ISO/IEC 10646-1:2000 standards if the character appears at the beginning of the character stream. Other forms, like UCS-2BE, UCS-4BE, UTF-16BE, and UTF-32BE, are fixed-width Unicode/ISO/IEC 10646 representation forms that do not recognize the BOM character and also assume big endian byte ordering. Representation forms like UCS-2LE, UCS-4LE, UTF-16LE, and UTF-32LE, on the other hand, assume little endian byte ordering. These forms also do not recognize the BOM character.

For associated scripts and languages of ISO8859–* and KO18–*, see http://czyborra.com/charsets/iso8869.html.


DtMail Support

As a result of increased coverage in scripts, Solaris DtMail running in the en_US.UTF-8 locale supports the following character sets, indicated by MIME names:

  • US-ASCII (7-bit US ASCII)

  • UTF-8 (UCS Transmission Format 8 bit)

  • UTF-7 (UCS Transmission Format 7 bit)

  • ISO-8859-1 (Latin-1)

  • ISO-8859-2 (Latin-2)

  • ISO-8859-3 (Latin-3)

  • ISO-8859-4 (Latin-4)

  • ISO-8859-5 (Latin/Cyrillic)

  • ISO-8859-6 (Latin/Arabic)

  • ISO-8859-7 (Latin/Greek)

  • ISO-8859-8 (Latin/Hebrew)

  • ISO-8859-9 (Latin-5)

  • ISO-8859-10 (Latin-6)

  • ISO-8859-13 (Latin-7/Baltic)

  • ISO-8859-14 (Latin-8/Celtic)

  • ISO-8859-15 (Latin-9)

  • ISO-8859-16 (Latin-10)

  • KOI8-R (Cyrillic)

  • ISO-2022-JP and EUC-JP (Japanese)

  • ISO-2022-KR and EUC-KR (Korean)

  • ISO-2022-CN (Simplified Chinese)

  • ISO-8859–13 (Latin-7/Baltic)

  • ISO-8859–14 (Latin-8/Celtic)

  • KOI8–U (Cyrillic/Ukrainian)

  • Shift_JIS (Japanese in Shift JIS)

  • GB2312 (Simplified Chinese in EUC)

  • TIS-620 (Thai)

  • UTF-16 (UCS Transmission Format 16 bit)

  • UTF-16BE (UTF-16 Big-Endian)

  • UTF-16LE (UTF-16 Little-Endian)

  • Windows-1250

  • Windows-1251

  • Windows-1252

  • Windows-1253

  • Windows-1254

  • Windows-1255

  • Windows-1256

  • Windows-1257

  • Windows-1258

  • Big5 (Traditional Chinese)

  • UTF-32 (UCS Transmission Format 32 bit)

  • UTF-32BE (UTF-32 Big-Endian)

  • UTF-32LE (UTF-32 Little-Endian)

This support enables users to view virtually any kind of email encoded in various character sets from any region of the world in a single instance of DtMail. DtMail decodes received email by looking at the MIME charset and content transfer encoding provided with the email. Windows-125x MIME charsets are supported.

For sending email, you need to specify a MIME charset that is understood by the recipient mail user agent (mail client), or you can use the default MIME charset provided by the en_US.UTF-8 locale. You can switch the character set of outgoing email, in the New Message window, press Control Y, or click the Format menu button and then click the Change Char Set button. The next available character set name displays in the bottom left corner at the top of the Send button.

If your email message header or message body contains characters that cannot be represented by the MIME charset specified, the system automatically switches the charset to UTF-8 which can represent any character.

If your message contains characters from the 7-bit US-ASCII character set only, the default MIME charset of your email is US-ASCII. Any mail user agent can interpret such email messages without loss of characters or information.

If your message contains characters from a mixture of scripts, the default MIME charset is UTF-8. Any 8-bit characters of UTF-8 are encoded with Quoted-Printable encoding. For more details on MIME, registered MIME charsets, and Quoted-Printable encoding, refer to RFCs 2045, 2046, 2047, 2048, 2049, 2279, 2152, 2237, 1922, 1557, 1555, and 1489.

Figure 5–9 DtMail New Message Window

The preceding context describes the graphic.

Programming Environment

Internationalized applications should automatically enable the en_US.UTF-8 locale. However, proper FontSet/XmFontList definitions in the application's resource file are required.

For information on internationalized applications, see Creating Worldwide Software: Solaris International Developer's Guide, 2nd edition.

FontSet Used with X Applications

For information about the FontSet used with X applications, please see Unicode Locale: en_US.UTF-8 Support.

Each character set has an associated set of fonts in the Solaris desktop environment.

The following is a list of the Latin-1 fonts that are supported in the current Solaris environment:

-dt-interface system-medium-r-normal-xxs sans utf-10-100-72-72-p-59-iso8859-1
-dt-interface system-medium-r-normal-xs sans  utf-12-120-72-72-p-71-iso8859-1
-dt-interface system-medium-r-normal-s sans  utf-14-140-72-72-p-82-iso8859-1
-dt-interface system-medium-r-normal-m sans  utf-17-170-72-72-p-97-iso8859-1
-dt-interface system-medium-r-normal-l sans  utf-18-180-72-72-p-106-iso8859-1
-dt-interface system-medium-r-normal-xl sans utf-20-200-72-72-p-114-iso8859-1
-dt-interface system-medium-r-normal-xxl sans utf-24-240-72-72-p-137-iso8859-1

For information on CDE common font aliases, including -dt-interface user-* and-dt-application-* aliases, see Common Desktop Environment: Internationalization Programmer's Guide.

In the en_US.UTF-8 locale, utf is also included in the locale's common font aliases as an additional attribute in the style field of the X logical font description name. Therefore, to have a proper set of fonts, the additional style has to be included in the font set creation as in the following example:

fs = XCreateFontSet(display,
"-dt-interface system-medium-r-normal-s*utf*",
 &missing_ptr, &missing_count, &def_string);

FontList Definition in CDE/Motif Applications

As with FontSet definition, the XmFontList resource definition of an application should also include the additional style attribute supported by the locale.

*fontList:\
 -dt-interface system-medium-r-normal-s*utf*: