Upgrading Character Sets from MySQL 4.0

Now, what about upgrading from older versions of MySQL? MySQL 4.1 is almost upward compatible with MySQL 4.0 and earlier for the simple reason that almost all the features are new, so there's nothing in earlier versions to conflict with. However, there are some differences and a few things to be aware of.

Most important: The “MySQL 4.0 character set” has the properties of both “MySQL 4.1 character sets” and “MySQL 4.1 collations.” You will have to unlearn this. Henceforth, we will not bundle character set/collation properties in the same conglomerate object.

There is a special treatment of national character sets in MySQL 4.1. NCHAR is not the same as CHAR, and N'...' literals are not the same as '...' literals.

Finally, there is a different file format for storing information about character sets and collations. Make sure that you have reinstalled the /share/mysql/charsets/ directory containing the new configuration files.

If you want to start mysqld from a 4.1.x distribution with data created by MySQL 4.0, you should start the server with the same character set and collation. In this case, you won't need to reindex your data.

There are two ways to do so:

shell> ./configure --with-charset=... --with-collation=...
shell> ./mysqld --default-character-set=... --default-collation=...

If you used mysqld with, for example, the MySQL 4.0 danish character set, you should now use the latin1 character set and the latin1_danish_ci collation:

shell> ./configure --with-charset=latin1 \
           --with-collation=latin1_danish_ci
shell> ./mysqld --default-character-set=latin1 \
           --default-collation=latin1_danish_ci

Use the table shown in the section called “4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs” to find old 4.0 character set names and their 4.1 character set/collation pair equivalents.

If you have non-latin1 data stored in a 4.0 latin1 table and want to convert the table column definitions to reflect the actual character set of the data, use the instructions in the section called “Converting 4.0 Character Columns to 4.1 Format”.

4.0 Character Sets and Corresponding 4.1 Character Set/Collation Pairs

ID4.0 Character Set4.1 Character Set4.1 Collation
1big5big5big5_chinese_ci
2czechlatin2latin2_czech_ci
3dec8dec8dec8_swedish_ci
4doscp850cp850_general_ci
5german1latin1latin1_german1_ci
6hp8hp8hp8_english_ci
7koi8_rukoi8rkoi8r_general_ci
8latin1latin1latin1_swedish_ci
9latin2latin2latin2_general_ci
10swe7swe7swe7_swedish_ci
11usa7asciiascii_general_ci
12ujisujisujis_japanese_ci
13sjissjissjis_japanese_ci
14cp1251cp1251cp1251_bulgarian_ci
15danishlatin1latin1_danish_ci
16hebrewhebrewhebrew_general_ci
17win1251(removed)(removed)
18tis620tis620tis620_thai_ci
19euc_kreuckreuckr_korean_ci
20estonialatin7latin7_estonian_ci
21hungarianlatin2latin2_hungarian_ci
22koi8_ukrkoi8ukoi8u_ukrainian_ci
23win1251ukrcp1251cp1251_ukrainian_ci
24gb2312gb2312gb2312_chinese_ci
25greekgreekgreek_general_ci
26win1250cp1250cp1250_general_ci
27croatlatin2latin2_croatian_ci
28gbkgbkgbk_chinese_ci
29cp1257cp1257cp1257_lithuanian_ci
30latin5latin5latin5_turkish_ci
31latin1_delatin1latin1_german2_ci

Converting 4.0 Character Columns to 4.1 Format

Normally, the server runs using the latin1 character set by default. If you have been storing column data that actually is in some other character set that the 4.1 server now supports directly, you can convert the column. However, you should avoid trying to convert directly from latin1 to the "real" character set. This may result in data loss. Instead, convert the column to a binary column type, and then from the binary type to a non-binary type with the desired character set. Conversion to and from binary involves no attempt at character value conversion and preserves your data intact. For example, suppose that you have a 4.0 table with three columns that are used to store values represented in latin1, latin2, and utf8:

CREATE TABLE t
(
    latin1_col CHAR(50),
    latin2_col CHAR(100),
    utf8_col CHAR(150)
);

After upgrading to MySQL 4.1, you want to convert this table to leave latin1_col alone but change the latin2_col and utf8_col columns to have character sets of latin2 and utf8. First, back up your table, then convert the columns as follows:

ALTER TABLE t MODIFY latin2_col BINARY(100);
ALTER TABLE t MODIFY utf8_col BINARY(150);
ALTER TABLE t MODIFY latin2_col CHAR(100) CHARACTER SET latin2;
ALTER TABLE t MODIFY utf8_col CHAR(150) CHARACTER SET utf8;

The first two statements “remove” the character set information from the latin2_col and utf8_col columns. The second two statements assign the proper character sets to the two columns.

If you like, you can combine the to-binary conversions and from-binary conversions into single statements:

ALTER TABLE t
    MODIFY latin2_col BINARY(100),
    MODIFY utf8_col BINARY(150);
ALTER TABLE t
    MODIFY latin2_col CHAR(100) CHARACTER SET latin2,
    MODIFY utf8_col CHAR(150) CHARACTER SET utf8;