Solaris Internationalization Guide For Developers
  Search only this book
Download this book in PDF
CHAPTER 6

Internationalization Framework in Solaris 2.6


Solaris 2.6 contains several new internationalization features discussed in this chapter, such as:
  • Codeset Independence support
  • Locale database
  • Process code format (wide character expression)
  • libw and libintl
  • ctype macros
  • genmsg utility
This chapter also contains information useful for developing internationalized applications, such as:
  • Dynamically linked applications
  • Solaris 2.6 internationalized APIs

Codeset Independence Support

Before the release of the Solaris 2.6 operating system, the Sun OS and the Solaris internationalization framework supported only Extended UNIX Code (EUC) representation. This prevented support of new encodings that didn't fit the EUC model, such as PC-Kanji in Japan and Big-5 in Taiwan.
Because a large part of the computer market demands non-EUC codeset support, Solaris 2.6 provides a solid framework to enable both EUC and non-EUC codeset support. This support is called Codeset Independence, or CSI.
The goal of CSI is to remove EUC dependencies on specific codesets or encoding methods from Solaris OS libraries and commands. The CSI architecture allows the Solaris operating environment to support any UNIX file system safe encoding. CSI supports a number of new codesets, such as UTF-8, PC-Kanji1, and Big-5.

The CSI Approach

Codeset Independence allows application and platform software developers to keep their code independent of encoding, such as UTF-8, and also provides the ability to adopt any new encoding without having to modify the source code. This architecture approach differs from Java internationalization in that Java requires applications to be Unicode-dependent and also requires code conversions throughout the application.
Many existing internationalized applications (for example, Motif) automatically inherit CSI support from the underlying system. These applications work in the new locales without modification. OPEN LOOK applications, however, that are XView/ OLIT based, don't work in the new locales because XView is codeset-dependent.
CSI is inherently independent from any codesets. However, the following assumptions on file code encodings (codesets) still apply to Solaris 2.6:
  • File code is a superset of ASCII.

    Unicode (16-bits fixed width) cannot be supported as file code.

  • NULL (0x00) is not part of multibyte characters for support of null-terminated multibyte character strings.
  • Slash / (0x2f) is not part of multibyte characters for support of the UNIX path names.
  • Only stateless file code encodings are supported.

CSI-enabled Commands

TABLE 6-1 contains CSI-enabled commands in Solaris 2.6. These commands are marked with CSI capabilities on their man page.

1. Japanese Solaris 2.5.1 supports PC Kanji (also known as Shift-JIS).
All commands are in the /usr/bin directory, unless otherwise noted.
TABLE 6-1
/usr/lib/diffh
/usr/sbin/accept
/usr/sbin/reject
/usr/ucb/lpr
/usr/xpg4/bin/awk
/usr/xpg4/bin/cp
/usr/xpg4/bin/date
/usr/xpg4/bin/du
/usr/xpg4/bin/ed
/usr/xpg4/bin/edit
/usr/xpg4/bin/egrep
/usr/xpg4/bin/env
/usr/xpg4/bin/ex
/usr/xpg4/bin/expr
/usr/xpg4/bin/fgrep
/usr/xpg4/bin/grep
/usr/xpg4/bin/ln
/usr/xpg4/bin/ls
/usr/xpg4/bin/more
/usr/xpg4/bin/mv
/usr/xpg4/bin/nice
/usr/xpg4/bin/nohup
/usr/xpg4/bin/od
/usr/xpg4/bin/pr
/usr/xpg4/bin/rm
/usr/xpg4/bin/sed
/usr/xpg4/bin/sort
/usr/xpg4/bin/tail
/usr/xpg4/bin/tr
/usr/xpg4/bin/vedit
/usr/xpg4/bin/vi
/usr/xpg4/bin/view
acctcom
apropos
batch
bdiff
cancel
cat
catman
chgrp
chmod
chown
cmp
col
comm
compress
cpio
csh
csplit
cut
diff
diff3
disable
echo
expand
file
find
fold
ftp
gencat
getopt
getoptcvt
head
join
jsh
kill
ksh
lp
man
mkdir
msgfmt
news
nroff
pack
paste
pcat
pg
printf
priocntl
ps
pwd
rcp
red
remsh
rksh
rmdir
rsh
script
sdiff
settime
sh
split
strconf
strings
sum
tabs
tar
tee
touch
tty
uncompress
unexpand
uniq
unpack
wc
whatis
write
xargs
zcat


Solaris 2.6 CSI-enabled Libraries

Nearly all functions in Solaris 2.6 libc (/usr/lib/libc.so) are CSI-enabled. However, the following functions in libc are not CSI-enabled because they are EUC dependent functions:
· csetcol() csetlen() euccol()
· euclen() eucscol() getwidth()

Also the following macros are not CSI-enabled because they are EUC dependent:
· csetno() wcsetno()

Solaris 2.6 libgen (/usr/ccs/lib/libgen.a) are internationalized, but not CSI enabled.
Solaris 2.6 libcurses (/usr/ccs/lib/libcurses.a) are internationalized, but not CSI enabled.

Locale Database

The locale database format and structure in Solaris 2.6 have changed from previous Solaris releases. The locale database is private and subject to change in a future release. Therefore, when developing an internationalized application, do not directly access the locale database. Instead you should use the Solaris internationalization APIs.

Note - When using Solaris 2.6, use the locale databases that are included with Solaris 2.6. Do not use locales from previous Solaris versions.


Process Code Format

The process code format in Solaris 2.6 is private and subject to change in a future release. Therefore, when developing an international application, do not assume the process code format will be the same. Instead you should use the Solaris internationalization APIs which are described in TABLE 6-3 on page 108.

Dynamically Linked Applications

Solaris 2.6 users can choose how to link applications with the system libraries, such as libc, by using dynamic linking or static linking. However, any application that requires internationalization features in the system libraries must be dynamically
linked. If the application has been statically linked, the operation to set the locale to other than C and POSIX using the setlocale function will fail. Statically linked applications can be operated only in C and POSIX locales.
By default, the linker program tries to link the application dynamically. If the command line options to the linker and the compiler include -Bstatic or -dn specifications, your application may be statically linked. You can check whether an existing application is dynamically linked using the /usr/bin/ldd command.
For example, if you type:

  % /usr/bin/ldd /sbin/sh  

the command displays the following message:

  % ldd: /sbin/sh: file is not a dynamic executable or shared object  

The message indicates the /sbin/sh command is not a dynamically linked program. Also, if you type:

  % /usr/bin/ldd /usr/bin/ls  

the command displays the following message:

  % libc.so.1 => /usr/lib/libc.so.1  
  % libdl.so.1 => /usr/lib/libdl.so.1  

This message indicates the /usr/bin/ls command has been dynamically linked with two libraries, libc.so.1 and libdl.so.1.
To summarize, if the message from the ldd command to the application does not contain a libc.so.1 entry, it indicates that the application has been statically linked with libc. In that case, you need to change the command line options to the linker so that dynamic linking is used instead, then re-link the application.

libw and libintl

In the Solaris 2.6 release, the implementation of libw and libintl has been moved to libc. The shared objects libw.so.1 and libintl.so.1 are provided as filters on libc.so.1, and the archives libw.a and libintl.a are provided as links to an empty archive.
The shared objects insure runtime compatibility for existing applications, and, together with the archives, provide compilation environment compatibility for building applications. However, it is no longer necessary to build applications against libw or libintl.
For more information on filters see the Linker and Libraries Guide.
TABLE 6-2 shows the stub entry points in libw and libintl:
TABLE 6-2 libwlibintl
libw..fgetwcfgetwsfputwcfputwsgetwc
getwchargetwsisenglishisideogramisnumber
isphonogramisspecialiswalnumiswalphaiswcntrl
iswctypeiswdigitiswgraphiswloweriswprint
iswpunctiswspaceiswupperiswxdigitputwc
putwcharputwsstrtowstowlowertowupper
ungetwcwatollwcscatwcschrwcscmp
wcscollwcscpywcscspnwcsftimewcslen
wcsncatwcsncmpwcsncpywcspbrkwcsrchr
wcsspnwcstodwcstokwcstolwcstoul
wcswcswcswidthwcsxfrmwctypewcwidth
wscasecmpwscatwschrwscmpwscol
wscollwscpywscspnwsdupwslen
wsncasecmpwsncatwsncmpwsncpywspbrk
wsprintfwsrchrwsscanfwsspnwstod
wstokwstolwstollwstostrwsxfrm
libintl bindtextdomaindcgettextdgettextgettexttextdomain

ctype Macros

Character classification and character transformation macros are defined in /usr/include/ctype.h. Solaris 2.6 provides a new set of ctype macros. The new macros support character classification and transformation semantics defined by XPG4. To access the new set of macros, one of the following conditions must be met:
  • _XPG4_CHAR_CLASS is defined,
  • _XOPEN_SOURCE and _XOPEN_VERSION=4 are defined, or
  • _XOPEN_SOURCE and _XOPEN_SOURCE_EXTENDED=1 are defined
This means that all XPG4 and XPG4.2 applications will automatically have the new macros. Since _XOPEN_SOURCE, _XOPEN_VERSION, and _XOPEN_SOURCE_EXTENDED will bring in extra XPG4 related features in addition to new ctype macros, non-XPG4 or XPG4.2 applications should use __XPG4_CHAR_CLASS__.
There are corresponding ctype functions. The Solaris 2.6 functions also support XPG4 semantics.
Refer to the ctype man page for details.

Internationalization APIs in libc

Solaris 2.6 offers two sets of APIs:
  • multibye (file codes)
  • wide characters (process code)
Applications do their processing in wide character codes.
When a program takes input from a file, convert your file's multibyte data into wide character process code with the mbtwoc and mbtowcs APIs. To convert the file output data from wide character format into multibyte format, use the wcstombs and wctomb APIs.
TABLE 6-3 shows a list of internationalization APIs included in Solaris 2.6.
TABLE 6-3 libc
API TypeLibrary RoutineDescription
Messaging Functionscatclose()Close a message catalog.
catgets()Read a program message.
catopen()Open a message catalog.
dgettext()Get a message from a message catalog with domain specified.
dcgettext()Get a message from a message catalog with domain and Category specified.
textdomain()Set and query the current domain.
bindtextdomain()Bind the path for a message domain.
Code conversioniconv()Convert codes.
iconv_close()Deallocate the conversion descriptor.
iconv_open()Allocate the conversion descriptor.
Regular expressionregcomp()Compile the regular expression.
regexec()Execute the regular expression matching.
regerror()Provide a mapping from error codes to error message.
regfree()Free memory allocated by regcomp().
Wide character classwctype()Define character class.
Locale relatedsetlocale()Modify and query a program's locale.
nl_langinfo()Get language and cultural information of current locale.
localeconv()Get monetary and numeric formatting
information of current locale.
Character classificationisalpha()Is character an alphabetic character?
isupper()Is character uppercase?
TABLE 6-3 libc (Continued)
API TypeLibrary RoutineDescription

islower()Is character lowercase?

isdigit()Is character a digit?

isxdigit()Is character a hex digit?

isalnum()Is character an alphabetic character or digit?

isspace()Is character a space?

ispunct()Is character a punctuation mark?

isprint()Is character printable?

iscntrl()Is character a control character?

isascii()Is character an ASCII character?

isgraph()Is character a visible character?

isphonogram()Is wide character a phonogram?

isideogram()Is wide character an ideogram?

isenglish()Is wide char in English alphabet from a supplementary codeset?

isnumber()Is wide character a digit from a supplementary codeset?

isspecial()Is special wide character from a supplementary codeset?

iswalpha()Is wide character an alphabetic character?

iswupper()Is wide character uppercase?

iswlower()Is wide character lowercase?

iswdigit()Is wide character a digit?

iswxdigit()Is wide character a hex digit?

iswalnum()Is wide character an alphabetic character or digit?

iswspace()Is wide character white space?

iswpunct()Is wide character a punctuation mark?

iswprint()Is wide character a printable character?

iswgraph()Is wide character a visible character?

iswcntrl()Is wide character a control character?

iswascii()Is wide character an ASCII character?
TABLE 6-3 libc (Continued)
API TypeLibrary RoutineDescription
Character transformationtoupper()Convert a lowercase character to uppercase.
tolower()Convert an uppercase character to lowercase.
towupper()Convert a lowercase wide character to uppercase.
towlower()Convert an uppercase wide character to lowercase.
Character collationstrcoll()Collate character strings.
strxfrm()Transform character strings for comparison.
wcscoll()Collate wide char strings.
wcsxfrm()Transform wide char strings for comparison.
Monetary handlingstrfmon()Convert monetary value to string representation.
Date and Time
handling
getdate()Convert user format date and time.
strftime()Convert date and time to string representation.
strptime()Date and time conversion.
Multibyte handlingmblen()Get length of multibyte character.
mbtowc()Convert multibyte to wide character.
mbstowcs()Convert multibyte string to wide character string.
Wide Characterswcscat()Concatenate wide char strings.
wcsncat()Concatenate wide char strings to length n.
wsdup()Duplicate wide char string.
wcscmp()Compare wide char strings.
wcsncmp()Compare wide char strings to length n.
wcscpy()Copy wide char strings.
wcsncpy()Copy wide char strings to length n.
wcschr()Find character in wide char string.
TABLE 6-3 libc (Continued)
API TypeLibrary RoutineDescription

wcsrchr()Find character in wide char string from right.

wcslen()Get length of wide char string.

wscol()Return display width of wide char string.

wcsspn()Return span of one wide char string in another.

wcscspn()Return span of one wide char string not in another.

wcspbrk()Return pointer to one wide char string in another.

wcstok()Move token through wide char string.

wcswcs()Find string in wide character string.

wcstombs()Convert wide character string to
multibyte string.

wctomb()Convert wide character to multibyte character.

wcwidth()Determine number of column positions of a wide character.

wcswidth()Determine number of column positions of a wide char string.
Wide Formattingwsprintf()Generate wide char string according to format.
wsscanf()Interpret wide char string according to format.
Wide Numberswcstol()Convert wide char string to long integer.
wcstoul()Convert wide char string to unsigned long integer.
wcstod()Convert wide char string to double precision.
Wide Stringswscasecmp()Compare wide char strings, ignores case differences.
wsncasecmp()Compare wide char strings to length n (ignores case).
Wide Standard I/Ofgetwc()Get multibyte char from stream, convert to wide char.
TABLE 6-3 libc (Continued)
API TypeLibrary RoutineDescription

getwchar()Get multibyte char from stdin, convert to wide char.

fgetws()Get multibyte string from stream, convert to wide char.

getws()Get multibyte string from stdin, convert to wide char.

fputwc()Convert wide char to multibyte char, puts to stream.

putwchar()Convert wide char to multibyte char, puts to stdin.

fputws()Convert wide char to multibyte string, puts to stream.

putws()Convert wide char to multibyte string, puts to stdin.

ungetwc()Push a wide char back into input stream.

genmsg Utility

The new genmsg utility can be used with the catgets() family of functions to create internationalized source message catalogs. The utility examines a source program file for calls to functions in catgets and builds a source message catalog
from the information it finds. For example:

  % cat example.c  
       ...  
       /* NOTE: %s is a file name */  
       printf(catgets(catd, 5, 1, "%s cannot be opened."));  
       /* NOTE: "Read" is a past participle, not a present  
           tense verb */  
       printf(catgets(catd, 5, 1, "Read"));  
       ...  
  % genmsg -c NOTE example.c  
  The following file(s) have been created.  
           new msg file = "example.c.msg"  
  % cat example.c.msg  
  $quote "  
  $set 5  
  1       "%s cannot be opened"  
       /* NOTE: %s is a file name */  
  2       "Read"  
       /* NOTE: "Read" is a past participle, not a present  
           tense verb */  

In the above example, genmsg is run on the source file example.c, which produces a source message catalog named example.c.msg. The -c option with the argument NOTE causes genmsg to include comments in the catalog. If a comment in the source program contains the string specified, the comment will appear in the message catalog after the next string extracted from a call to catgets().
You can use genmsg to number the messages in a message set automatically.
For more information, see the genmsg man page.

Note - The material in this section is used with permission from Creating Worldwide Software: Solaris International Developer's Guide, 2nd edition by Bill Tuthill and David A. Smallberg, published by Sun Microsystems Press/Prentice Hall. (c)1997 Sun Microsystems, Inc.