UTF8 for Metadata

The metadata is the data about the data. Anything that describes the database, as opposed to being the contents of the database, is metadata. Thus column names, database names, usernames, version names, and most of the string results from SHOW are metadata.

Representation of metadata must satisfy these requirements:

In order to satisfy both requirements, MySQL stores metadata in a Unicode character set, namely UTF8. This will not cause any disruption if you never use accented characters. But if you do, you should be aware that metadata is in UTF8.

This means that the USER(), CURRENT_USER(), and VERSION() functions will have the UTF8 character set by default. So will any synonyms, such the SESSION_USER() and SYSTEM_USER() synonyms for USER().

The server sets the character_set_system system variable to the name of the metadata character set:

mysql> SHOW VARIABLES LIKE 'character_set_system';
+----------------------+-------+
| Variable_name        | Value |
+----------------------+-------+
| character_set_system | utf8  |
+----------------------+-------+

Storage of metadata using Unicode does not mean that the headers of columns and the results of DESCRIBE functions will be in the character_set_system character set by default. When you say SELECT column1 FROM t, the name column1 itself will be returned from the server to the client in the character set as determined by the SET NAMES statement. More specifically, the character set used is determined by the value of the character_set_results system variable. If this variable is set to NULL, no conversion is performed and the server returns metadata using its original character set (the set indicated by character_set_system).

If you want the server to pass metadata results back in a non-UTF8 character set, then use SET NAMES to force the server to perform character set conversion (see the section called “Connection Character Sets and Collations”), or else set the client to do the conversion. It is always more efficient to set the client to do the conversion, but this option will not be available for many clients until late in the MySQL 4.x product cycle.

If you are just using, for example, the USER() function for comparison or assignment within a single statement, don't worry. MySQL will do some automatic conversion for you.

SELECT * FROM Table1 WHERE USER() = latin1_column;

This will work because the contents of latin1_column are automatically converted to UTF8 before the comparison.

INSERT INTO Table1 (latin1_column) SELECT USER();

This will work because the contents of USER() are automatically converted to latin1 before the assignment. Automatic conversion is not fully implemented yet, but should work correctly in a later version.

Although automatic conversion is not in the SQL standard, the SQL standard document does say that every character set is (in terms of supported characters) a “subset” of Unicode. Since it is a well-known principle that “what applies to a superset can apply to a subset,” we believe that a collation for Unicode can apply for comparisons with non-Unicode strings.