Chapter 2 Programming Information

Handling internationalization and localization

This section discusses internationalization and localization issues relevant to jConnect.

Using jConnect to pass Unicode data

With the release of Adaptive Server Enterprise 12.5, database clients can take advantage of two new server datatypes, unichar and univarchar, which allow for the efficient storage and retrieval of Unicode data.

Quoting from the Unicode Standard, version 2.0:[Unmapped FQGI: TECHDOC BOOK DIVISION SECTION SECTION PARA MESSAGE ] "The Unicode Standard is a fixed-width, uniform encoding scheme for encoding characters and text. The repertoire of this international character code for information processing includes characters for the major scripts of the world, as well as technical symbols in common. The Unicode character encoding treats alphabetic characters, ideographic characters, and symbols identically, which means they can be used in any mixture and with equal facility. The Unicode Standard is modeled on the ASCII character set, but uses a 16-bit encoding to support full multilingual text."

This means that the user can designate database table columns to store Unicode data, and clients, such as jConnect, can efficiently store Unicode data directly, without the overhead of conversion.

Two things must happen for jConnect to take advantage of this feature:

The database server must have the UTF-8 character set loaded as its default character set.
When you connect to the server using jConnect, you must set the JCONNECT_VERSION property to VERSION_6 or VERSION_LATEST.

When these two conditions are met, jConnect can properly store and retrieve Unicode data from the database. Where this feature is enabled, your jConnect application will continue to behave as expected. That is, your JDBC calls to such methods as PreparedStatement.setString (int column, String value) do not need to be modified just because you have set the JCONNECT_VERSION to 6 and turned on the server capability.

Where the difference will be seen, however, is in the "under the covers" work done by jConnect in communicating character data to the server. Where you are storing data in a database column designed to hold Unicode data, or when you are selecting Unicode data from such a column, jConnect performs all the necessary conversions.

A side effect is that when the above two conditions are met, and the unichar and univarchar datatypes setting is turned on, the CHARSET and CHARSET_CONVERTER connection property settings are ignored by jConnect. This is because with unichar enabled, all character data is passed to the server as Unicode data; therefore the CHARSET setting is irrelevant, and all conversion is handled internally by jConnect.

For more information on support for unichar and univarchar datatypes, see the Adaptive Server Enterprise version 12.5 manuals.

jConnect character-set converters

jConnect uses special classes for all character-set conversions. By selecting a character-set converter class, you specify how jConnect should handle single-byte and multibyte character-set conversions, and the performance impact the conversions will have on your applications.

There are two character-set conversion classes. The conversion class that jConnect uses is based on the version setting (for example, VERSION_4), and the CHARSET and CHARSET_CONVERTER_CLASS connection properties.

The TruncationConverter class works only with single-byte character sets that use ASCII characters such as iso_1 and cp850. It does not work with multibyte character sets or single-byte character sets that use non-ASCII characters.

Using the TruncationConverter class, jConnect 5.x handles character sets in the same manner as jConnect version 2.2. The TruncationConverter class is the default converter when the version setting is VERSION_2.
The PureConverter class is a pure Java, multibyte character-set converter. jConnect uses this class if the version setting is VERSION_4 or later. jConnect also uses this converter with VERSION_2 if it detects a character set specified in the CHARSET connection property that is not compatible with the TruncationConverter class.

Although it enables multibyte character-set conversions, the PureConverter class may negatively impact jConnect driver performance. If driver performance is a concern, see "Improving character-set conversion performance".

Selecting a character-set converter

jConnect uses the version setting from SybDriver.setVersion( ) to determine the default character-set converter class to use. For VERSION_2, the default is TruncationConverter. For VERSION_4 and later, the default is PureConverter.

You can also set the CHARSET_CONVERTER_CLASS connection property to specify which character-set converter you want jConnect to use. This is useful if you want to use a character-set converter other than the default for your jConnect version.

For example, if you set jConnect to VERSION_4 or later, but want to use the TruncationConverter class rather than the multibyte PureConverter class, you can set CHARSET_CONVERTER_CLASS:

For jConnect 4.x:

...
 props.put("CHARSET_CONVERTER_CLASS", 
 	"com.sybase.utils.TruncationConverter")

For jConnect 5.x:

...
 props.put("CHARSET_CONVERTER_CLASS", 
 	"com.sybase.jdbc2.utils.TruncationConverter")

Setting the CHARSET connection property

You can specify the character set to use in your application by setting the CHARSET driver property. If you do not set the CHARSET property:

For VERSION_2, jConnect uses iso_1 as the default character set.
For VERSION_3, VERSION_4, VERSION_5, and VERSION_6, jConnect uses the database's default character set, and adjusts automatically to perform any necessary conversions on the client side.

You can also use the -J charset command line option for the IsqlApp application to specify a character set.

To determine which character sets are installed on your Adaptive Server, issue the following SQL query on your server:

select name from syscharsets
 go

For the PureConverter class, if the designated CHARSET does not work with the client's Java Virtual Machine (VM), the connection fails with a SQLException, indicating that you must set CHARSET to a character set that is supported by both Adaptive Server and the client.

When the TruncationConverter class is used, character truncation is applied regardless of whether the designated CHARSET is 7-bit ASCII or not.

Improving character-set conversion performance

If you use multibyte character sets and need to improve driver performance, you can use the SunIoConverter class provided with the jConnect samples. See "Character-set conversion" for details.

Supported character sets

Table 2-4 lists the Sybase character sets that are supported by jConnect. The table also lists the corresponding JDK byte converter for each supported character set.

Although jConnect supports UCS-2, currently no Sybase databases or open servers support UCS-2.

Adaptive Server Enterprise version 12.5 supports a version of Unicode known as the UTF-16 encoding.

You can still send Unicode data to a Sybase Adaptive Server version 12.5 and later by setting JCONNECT_VERSION property to VERSION_6, and by having the server's default character set as UTF-8.

The Sybase sjis character set does not include the IBM or Microsoft extensions to JIS, whereas the JDK SJIS byte converter includes these extensions. As a result, conversions from Java strings to a Sybase database using sjis may result in character values that are not supported by the Sybase database. However, conversions from sjis on a Sybase database to Java strings should not have this problem.

Table 2-4 lists the character sets currently supported by Sybase.

Table 2-4: Supported Sybase character sets
SybCharset name	JDK byte converter
ascii_7	8859_1
big5	Big5
cp037	Cp037
cp437	Cp437
cp500	Cp500
cp850	Cp850
cp852	Cp852
cp855	Cp855
cp857	Cp857
cp860	Cp860
cp863	Cp863
cp864	Cp864
cp866	Cp866
cp869	Cp869
cp874	Cp874
cp932	Cp932
cp936	Cp936
cp950	Cp950
cp1250	Cp1250
cp1251	Cp1251
cp1252	Cp1252
cp1253	Cp1253
cp1254	Cp1254
cp1255	Cp1255
cp1256	Cp1256
cp1257	Cp1257
cp1258	Cp1258
deckanji	EUCJIS
eucgb	GB2312
eucjis	EUCJIS
eucksc	Cp949
ibm420	Cp420
ibm918	Cp918
iso_1	8859_1
iso88592	8859-2
is088595	8859_5
iso88596	8859_6
iso88597	8859_7
iso88598	8859_8
iso88599	8859_9
iso885915	8859_15
koi8	KOI8_R
mac	Macroman
mac_cyr	MacCyrillic
mac_ee	MacCentralEurope
macgreek	MacGreek
macturk	MacTurkish
sjis (see note)	SJIS
tis620	MS874
utf8	UTF8

European currency symbol support

jConnect version 4.1 and later support the use of the new European currency symbol, or "euro " and its conversion to and from UCS-2 Unicode.

The euro has been added to the following Sybase character sets: cp1250, cp1251, cp1252, cp1253, cp1254, cp1255, cp1256, cp1257, cp1258, cp874, iso885915, and utf8.

Character sets cp1257, cp1258, and iso885915 are new.

To use the euro symbol:

Use the PureConverter class, a pure Java, multibyte character-set converter. See "jConnect character-set converters" for more information.
Verify that the new character sets are installed on the server.

The euro symbol is currently supported only on Adaptive Server Enterprise version 11.9.2 and later; Adaptive Server Anywhere does not support the euro symbol.
Select the appropriate character set on the client. See "Setting the CHARSET connection property" for more information.
Upgrade to JDK 1.1.7 or the Java™ 2 Platform.

Unsupported character sets

The following Sybase character sets are not supported in jConnect 5.x because no JDK byte converters are analogous to the Sybase character sets:

cp1047
euccns
greek8
roman8
turkish8

You can use these character sets with the TruncationConverter class as long as the application uses only the 7-bit ASCII subsets of these characters.