ASA Database Administration Guide
International Languages and Character Sets
Understanding locales
Both application and server locale definitions have a character set. The application uses its character set when requesting character strings from the server. If character set translation is enabled (the default), the database server compares its character set with that of the application to determine whether character set translation is needed.
For more information about how to find locale settings, see Determining locale information.
The client library determines the character set as follows:
If the connection string specifies a character set, it is used.
For more information, see CharSet connection parameter [CS].
Open Client applications check the locales.dat file in the Sybase locales directory.
Character set information from the operating system is used to determine the locale:
On Windows operating systems, use the GetACP system call. This returns the ANSI character set, not the OEM character set.
On UNIX, default to ISO8859-1.
On other platforms, use code page 1252.
The database server determines the character set for a connection as follows:
The character set specified by the client is used if it is supported.
For more information, see CharSet connection parameter [CS].
The database's character set is used if the client specifies a character set that is not supported.
When a new database is created, the database server determines the character set for the new database as follows.
A collation is specified in the CREATE DATABASE statement or on with the dbinit utility.
The ASCHARSET environment variable is used if it exists.
Character set information from the operating system is used to determine the locale.
On Windows operating systems, use the GetACP system call. This returns the ANSI character set, not the OEM character set.
On UNIX, default to ISO8859-1.
On other platforms, use code page 1252.
The locale character set and language are used to determine which collation to use when creating a database if none is explicitly specified.
The following table displays the valid character set label values, together with the equivalent IANA labels and a description:
Character set label | IANA label | Description |
---|---|---|
big5 | <N/A> | Traditional Chinese (cf. CP950) |
cp437 | <N/A> | IBM CP437 - U.S. code set |
cp850 | <N/A> | IBM CP850 - European code set |
cp852 | <N/A> | PC Eastern Europe |
cp855 | <N/A> | IBM PC Cyrillic |
cp856 | <N/A> | Alternate Hebrew |
cp857 | <N/A> | IBM PC Turkish |
cp860 | <N/A> | PC Portuguese |
cp861 | <N/A> | PC Icelandic |
cp862 | <N/A> | PC Hebrew |
cp863 | <N/A> | IBM PC Canadian French code page |
cp864 | <N/A> | PC Arabic |
cp865 | <N/A> | PC Nordic |
cp866 | <N/A> | PC Russian |
cp869 | <N/A> | IBM PC Greek |
cp874 | <N/A> | Microsoft Thai SB code page |
cp932 | windows-31j | Microsoft CP932 = Win31J-DBCS |
cp936 | </N/A> | Simplified Chinese |
cp949 | <N/A> | Korean |
cp950 | <N/A> | PC (MS) Traditional Chinese |
cp1250 | <N/A> | MS Windows Eastern European |
cp1251 | <N/A> | MS Windows Cyrillic |
cp1252 | <N/A> | MS Windows US (ANSI) |
cp1253 | <N/A> | MS Windows Greek |
cp1254 | <N/A> | MS Windows Turkish |
cp1255 | <N/A> | MS Windows Hebrew |
cp1256 | <N/A> | MS Windows Arabic |
cp1257 | <N/A> | MS Windows Baltic |
cp1258 | <N/A> | MS Windows Vietnamese |
deckanji | <N/A> | DEC UNIX JIS encoding |
euccns | <N/A> | EUC CNS encoding: Traditional Chinese with extensions |
eucgb | <N/A> | EUC GB encoding = Simplified Chinese |
eucjis | euc-jp | Sun EUC JIS encoding |
eucksc | <N/A> | EUC KSC Korean encoding (cf. CP949) |
greek8 | <N/A> | HP Greek-8 |
iso_1 | iso_8859-1:1987 | ISO 8859-1 Latin-1 |
iso15 | <N/A> | ISO 8859-15 Latin1 with Euro, etc. |
iso88592 | iso_8859-2:1987 | ISO 8859-2 Latin-2 Eastern Europe |
iso88595 | iso_8859-5:1988 | ISO 8859-5 Latin/Cyrillic |
iso88596 | iso_8859-6:1987 | ISO 8859-6 Latin/Arabic |
iso88597 | iso_8859-7:1987 | ISO 8859-7 Latin/Greek |
iso88598 | iso_8859-8:1988 | ISO 8859-8 Latin/Hebrew |
iso88599 | iso_8859-9:1989 | ISO 8859-9 Latin-5 Turkish |
koi8 | <N/A> | KOI-8 Cyrillic |
mac | macintosh | Standard Mac coding |
mac_cyr | <N/A> | Macintosh Cyrillic |
mac_ee | <N/A> | Macintosh Eastern European |
macgrk2 | <N/A> | Macintosh Greek |
macturk | <N/A> | Macintosh Turkish |
roman8 | hp-rpman8 | HP Roman-8 |
sjis | shift_jis | Shift JIS (no extensions) |
tis620 | <N/A> | TIS-620 Thai standard |
turkish8 | <N/A> | HP Turkish-8 |
utf8 | utf-8 | UTF-8 treated as a character set |