Changing a database from one collation to another

Creating a database with a custom collation

ASA Database Administration Guide
International Languages and Character Sets
International language and character set tasks

Changing a database from one collation to another

Changing your database from one collation to another may be a good idea for any number of reasons. It can be especially useful, for example, when:

you want to avoid character set translation across your setup.
the characters in your database don't match the collation in your database. Using the same character set defined in your database is especially important for sorting purposes.
you want to use a different character set. You may, for example, want to move from an OEM character set to a Windows character set.

Simply modifying the collation in an existing database is not permitted, since it would invalidate all the indexes for that database. In order to change the collation for a database, you must rebuild the database. Rebuilding a database creates a new database with new settings (including collation settings), using the old database's data.

When you change the collation for a database, there are two main scenarios to consider. The difference between the two lies in whether the character set of the data needs to be changed.

Example 1

In the first example, only the collation needs to be changed. The data should not change character sets. To resolve the collation issue, you need to rebuild the database with new collation settings using the old data.

In an English environment, consider an old database using the 850LATIN1 collation. If the database contains data inserted from a Windows 'windowed' application, it is likely that the data is actually from the CP1252 character set, which does not match CP850 used by the 850LATIN1 collation. This situation will often be discovered when an ORDER BY clause seems to sort accented characters incorrectly. To correct this problem, you would create a new database using the 1252LATIN1 collation, and move the data from the old database to the new database without translation, since the data is already in the character set (CP1252) that matches the new database's collation.

The simplest way to ensure that translation does not occur is to start the server with the -ct- option. Then, rebuild the database normally.

For more information about rebuilding a database, see Rebuilding databases.

For more information about specifying collations when creating databases, see Creating a database with a named collation.

Example 2

In the second situation, both the collation and the character set need to be changed. To resolve the collation and character set issues, you need to rebuild the database with the new collation settings, and change the character set of the data.

Suppose that the 850LATIN1 database had been used properly such that it contains characters from the CP850 character set. However, you want to update both the collation and the character set, perhaps to avoid character set translation. You would create a new database using 1252LATIN1, and move the data from the old database to the new database with translation, thus converting the CP850 characters to CP1252.

The translation of the database data from one character set to another occurs using the client-server translation feature of the server, which translates the character data during the communication between the client application and the server. The database's collation determines the character set for the database server side of the communication. The locale of the operating system determines the client's default character set, however, the client's character set can be overridden by the CharSet (CS) connection parameter.

For more information about the CharSet (CS) connection parameter, see CharSet connection parameter [CS].

Since character set translation takes place during the communication between the client application and the server, an external unload or reload is necessary. An internal unload and reload does not do character set translation. Similarly, if character set translation occurs in both the unload and the reload steps, you perform the translation and then immediately undo the translation and still end up where you began. Character set translation can occur in either the unload or the reload steps, but not in both.

To convert a database from one collation to another, and translate the data's character set (using translation on reload)

Unload the data from the source database.

You can use the Unload utility to produce a reload.SQL file and a set of data files in the character set of the source database. Since we do not want any translation during this phase, ensure that character set translation is not enabled on the server running the source database. For servers before version 8, ensure that -ct is not specified. If you are using a server that is version 8 or higher, ensure that -ct- (no character set translation) is specified when the server is started.

If the unload/reload is occurring on a single machine, use the -ix option to do an internal unload and an external reload. If the unload/reload occurs across machines, use the Unload utility with the -xx option to force an external unload and an external reload.

Remember that an "external" unload or reload means that an application (dbunload and DBISQL) opens a cursor on the database and either reads or writes the data to disk. Character set translation occurs. An "internal" unload or reload means that an UNLOAD TABLE or LOAD TABLE is used so that the server reads or writes the data itself. Character set translation does not occur.

If you want to unload data from specific tables, use the -t option, or the Interactive SQL OUTPUT statement.

For more information on the Unload utility, see The Unload utility.
Create a target database with the appropriate collation using the Initialization utility including the -z option to specify the collation sequence for the target database.

The database should be created and reloaded using the version of the server and tools corresponding to the server that they will use to run it.

For more information on specifying collations when creating databases, see Creating a database with a named collation.
Start the target database using the server running with character set translation. You can also specify the -z option.

The -z option, while not required, allows for verification of the character sets that will be used, and that translation will occur. The server window will show character set translation settings for each connection. However, using the -z server option can cause slower performance during the reload.

For more information, see Starting a database server using character set translation.
If you generated a reload.SQL file in step 1, you can run the file in a command shell using a command such as the following.
```
dbisql -c "uid=dba;pwd=sql;charset=utf8" -codepage 950 reload.sql
```
You must supply the appropriate connection parameters for your database, as well as the code page (950 in the example above) that was used to generate the reload.SQL file.

If the data files and reload.sql contain UTF-8 data, use dbisqlc instead of dbisql.

For more information about the -codepage option, see The Interactive SQL utility.

If you exported only table data in step 1, you can import the data using the INPUT statement in Interactive SQL.

For more information about the INPUT statement, see INPUT statement [Interactive SQL].