内に含ま
その他のドキュメント
サポート リソース
| PDF 文書ファイルをダウンロードする (1010 KB)
Glossary
- ANSI
-
American National Standards Institute. ANSI proposes standard
definitions for different computing languages. The most recent standard for
the C language, prepared by the ANSI C X3J11 Committee, includes library functions
for computing with multibyte characters for international usage, as well as
a new data type, wchar_t, for dealing with four-byte
characters. This standard is not completed, so it is referred to as the “proposed
ANSI C standard,” or ANSI C-X3J11.
- ASCII
-
American Standard Code for Information Interchange. A seven
bit code containing English upper and lowercase letters, punctuation, numbers
and control codes. The eighth bit in each byte is used by different applications
for parity checking, communication and message passing protocols, compacting
data, or other purposes. Applications that are intended to be internationalized
cannot utilize this bit if they are going to use multiple code sets or multibyte
characters, and utilities that handle multiple code sets or multibyte characters.
- Category
-
In the Korean Solaris documentation set, category
is related to localization. A category is a portion of a country's language
representation and cultural conventions. For instance, the date is often represented
in the U.S. as Month, Day, Year; while in another country it might be Day, Month, Year.
The date and time can be thought of as one category of a local language. Categories
also refer to the program categories, the environment variables that are related
to categories, and the ANSI localization tables for each category.
- Character Set
-
A character set is defined as a set of elements used for the
organization, control, or representation of data. Character sets may be composed
of alphabets, ideograms, or other units. This may seem a bit open-ended, but
character sets may contain other character sets, which makes the boundaries
unclear. For example, the KS C 5601 character set contains English, Greek,
Russian, and Japanese character sets, in addition to Hangul syllables (consonant
and vowel combinations), Hanja ideograms (Chinese characters), and many other
characters.
- code set
-
Also called a coded character set, this is a set of unambiguous
rules that establishes a character set and the one-to-one relationship between
each character in the character set and its bit representation. For example,
the English character set, including punctuation and numbers, can be mapped
to the ASCII code set in such a way that each character corresponds to only
one bit code, and no bit code corresponds to more than one character.
- Combination code
-
Another name for Packed code or Johap code described below.
- Completion code
-
Also called Wangsung. Completion code is a pre-defined set
of Korean character codes, which maps preselected Hangul, Hanja, special
symbols, alphabets of other languages and so on into two-byte coding space.
This representation is defined in KS C 5601 and used as EUC code set 1 by
the Korean Solaris Operating System.
- EUC
-
Extended UNIX Code. Describes four code sets modelled on ISO-2022.
Each code set can contain one or more different character sets, like the Hangul
and Hanja character sets in KS C 5601. The four code sets are referred to
as code sets 0, 1, 2, and 3, and in this text they are sometimes abbreviated
as cs0, cs1, cs2, and cs3. Other internationalization efforts sometimes call
these g0, g1, g2, and g3. Code set 0 is also called the primary code set,
and code sets 1, 2, and 3 are called the supplementary code sets. In the Korean
and Chinese implementations of the EUC codes, the primary code set (cs0) contains
ASCII and begins with a zero in the most significant bit.
- Hangul
-
Hangul is the phonetic alphabet commonly used in Korea. Each
character corresponds to a spoken syllable, usually a consonant-vowel pair
or a consonant-vowel-consonant triad. KS C 5601 defines 2350 Hangul characters
used in standard computing.
- Hanja
-
Hanja characters are Korean ideograms, which came
originally from ancient China (the word itself means Chinese character). They
were adopted many centuries ago and have evolved somewhat different meanings
in China and Korea. But because they are not phonetically based, Chinese and Korean
Hanja have remained closer in meaning than have Italian, French, and Spanish,
which evolved into separate languages over the same time span. The Korean
Industry Standard defines the 4888 most frequently used Hanja characters in
the KS C 5601 standard.
- ISO
-
International Standards Organization. Composed of a number
of professional societies and companies, this organization studies and makes
recommendations on internationalization issues. ISO 2022 proposes and describes
the Extended UNIX Codes. Other ISO proposals include the European 8-bit code
and communication protocols for internationalization.
- Johap code
-
Johap code is a Packed code (also called Combination code),
which is defined in the KS C 5601-1992 document. Unlike the Packed code defined
in KS C 5601-1987 or before, Johap code has a set of Hanja characters and
special symbol characters.
- KSC
-
Korean Industry Standard Code Set. This is the Korean
analogue to ASCII. The KSC describes standards for computing in the Korean
environment. KS C 5601 contains code assignments in Completion code for Hangul
and Hanja characters, graphics and punctuation characters, two Japanese phonetic
alphabets (Hiragana and Katakana), control codes, and several western alphabets
(Roman, Russian, and Greek characters). This standard defines 2350 Hangul
characters, 4888 Hanja characters, and 986 additional characters (for punctuation,
foreign alphabets, numbers, graphics, and others). Each character is two bytes
long, and does not utilize the highest or most significant bit of each byte.
In other words, it uses the lower seven bits of each byte for character assignments.
- Locale
-
A locale describes a language or cultural environment. Its
setting affects the display or manipulation of language-dependent features. Korean
Solaris software provides C for U.S.A, ko
for Korean extended UNIX code, and ko.UTF-8 for Korean
Universal Multiple Octet Coded Character Set Transmission Format.
- N-byte code
-
This coding system assigns each Korean alphabetic
consonant or vowel a one-byte code. These are built up into Hangul syllabic
characters with the Hangul automata.
- Packed code
-
Packed code (also called Combination code) is a systematic
method for coding Hangul syllabic characters in a two-byte code. Each 16-bit
(two-byte) character contains a high or most-significant bit (1) and three
5-bit fields. These fields contain the codes for the beginning consonant (x), a middle vowel (y), and an optional
ending consonant (z), as follows: 1xxxxxyyyyyzzzzz. Hanja characters cannot be represented in
Packed code, because many Hanja characters may be represented by one phonetic
pronunciation. Packed code is defined in KS C 5601-1987 and earlier as a supplementary
code set.
- POSIX
-
Portable Operating System for Computer Environments. An IEEE
standards group comprising seven committees that create documents for standardizing
and internationalizing UNIX. POSIX document 1003.1 deals with the kernel and
system calls. 1003.2 concerns the C-shell and standard libraries. The other
five deal with real-time computing, communications and networking, and other
issues.
- UTF-8
-
Universal Multiple Octet Coded Character Set (UCS) Transmission
Format. ko-UTF-8 provides the Korean-related characters
in this standard. UTF-8 is a representation of Unicode.
- Unicode
-
The international character set and encoding developed by
the Unicode Consortium.
- Wide Character Code (WC)
-
A constant-width four-byte code, called WC in Asian Solaris
documentation, for the internal representation of EUC codes using the new
ANSI-C data type wchar_t. Although EUC does not specify
limits on the size of the supplementary code sets (code set 0 is always one
byte), WC specifies a character as four bytes. Standardizing on four bytes
takes up more memory space than necessary if the environment is primarily
ASCII, but it also speeds processing time for strings of mixed characters;
the 1000th character always begins at byte 4000 (and the 0th character starts
at byte 0). This is useful for any type of indexing in applications.
- X/Open
-
X/Open started as a consortium of international UNIX vendors
from Europe, USA, and Asia. It is now one of the major standards organizations
like POSIX and ANSI; source of the X/Open System Interface Portability
Guide.
|