Contents Index The Collation section The Properties section

ASA Database Administration Guide
  International Languages and Character Sets
    Collation internals

The Encodings section


The encodings section is optional, and follows the collation sequence. It is not useful for single-byte character sets.

The encodings section lists which characters are lead-bytes, for multibyte character sets, and what are valid follow-bytes.

For example, the Shift-JIS encodings section is as follows:

Encodings:
[\x00-\x80,\xa0-\xdf,\xf0-\xff]
[\x81-\x9f,\xe0-\xef][\x00-\xff]

The first line following the section title lists valid single-byte characters. The square brackets enclose a comma-separated list of ranges. Each range is listed as a hyphen-separated pair of values. In the Shift-JIS collation, values \x00 to \x80 are valid single-byte characters, but \x81 is not a valid single-byte character.

The second line following the section title lists valid multibyte characters. Any combination of one byte from the second line followed by one byte from the first is a valid character. Therefore \x81\x00 is a valid double-byte character, but \xd0 \x00 is not.


Contents Index The Collation section The Properties section