Multibyte character sets

ASA Database Administration Guide
International Languages and Character Sets
Understanding character sets in software

Multibyte character sets

Some languages, such as Japanese and Chinese, have many more than 256 characters. These characters cannot all be represented using a single byte, but can be represented in multibyte character sets. In addition, some character sets use the much larger number of characters available in a multibyte representation to represent characters from many languages in a single, more comprehensive, character set.

Multibyte character sets are of two types. Some are variable width, in which some characters are single-byte characters, others are double-byte, and so on. Other sets are fixed width, in which all characters in the set have the same number of bytes. Adaptive Server Anywhere supports only variable-width character sets.

For more information on the multibyte character sets, see Using multibyte collations.

Example

As an example, characters in the Shift-JIS character set are either one or two bytes in length. If the value of the first byte is in the range of hexadecimal values from \x81 to \x9F or from \xE0 to \xEF (decimal values 129-159 or 224-239) the character is a two-byte character and the subsequent byte (called a follow byte) completes the character. A follow byte is any byte(s) other than the first byte.

If the first byte is outside this range, the character is a single-byte character and the next byte is the first byte of the following character.

The properties of any Shift-JIS character can be read from its first byte also. Characters with a first byte in the range \x09 to \x0D, or \x20 are space characters.
Characters in the ranges \x41 to \x5A, \x61 to \x7A, \x81 to \x9F or \xE0 to \xEF are considered to be alphabetic (letters).
Characters in the range \x30 to \x39 are digits.