Here is an annotated list of character sets and collations that MySQL supports. Because options and installation settings differ, some sites might not have all items listed, and some sites might have items not listed.
MySQL supports 70+ collations for 30+ character sets. The character sets and their default collations are displayed by the SHOW CHARACTER SET STATEMENT. (The output actually includes another column that is not shown so that the example fits better on the page.)
mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+ | Charset | Description | Default collation | +----------+-----------------------------+---------------------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | | dec8 | DEC West European | dec8_swedish_ci | | cp850 | DOS West European | cp850_general_ci | | hp8 | HP West European | hp8_english_ci | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | | latin1 | ISO 8859-1 West European | latin1_swedish_ci | | latin2 | ISO 8859-2 Central European | latin2_general_ci | | swe7 | 7bit Swedish | swe7_swedish_ci | | ascii | US ASCII | ascii_general_ci | | ujis | EUC-JP Japanese | ujis_japanese_ci | | sjis | Shift-JIS Japanese | sjis_japanese_ci | | cp1251 | Windows Cyrillic | cp1251_bulgarian_ci | | hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | | tis620 | TIS620 Thai | tis620_thai_ci | | euckr | EUC-KR Korean | euckr_korean_ci | | koi8u | KOI8-U Ukrainian | koi8u_general_ci | | gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | | greek | ISO 8859-7 Greek | greek_general_ci | | cp1250 | Windows Central European | cp1250_general_ci | | gbk | GBK Simplified Chinese | gbk_chinese_ci | | latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | | utf8 | UTF-8 Unicode | utf8_general_ci | | ucs2 | UCS-2 Unicode | ucs2_general_ci | | cp866 | DOS Russian | cp866_general_ci | | keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | | macce | Mac Central European | macce_general_ci | | macroman | Mac West European | macroman_general_ci | | cp852 | DOS Central European | cp852_general_ci | | latin7 | ISO 8859-13 Baltic | latin7_general_ci | | cp1256 | Windows Arabic | cp1256_general_ci | | cp1257 | Windows Baltic | cp1257_general_ci | | binary | Binary pseudo charset | binary | | geostd8 | GEOSTD8 Georgian | geostd8_general_ci | +----------+-----------------------------+---------------------+
MySQL has two Unicode character sets. You can store texts in about 650 languages using these character sets. We have added several collations for these two new sets, with more to come.
ucs2 (UCS-2 Unicode) collations:
ucs2_bin
ucs2_czech_ci
ucs2_danish_ci
ucs2_estonian_ci
ucs2_general_ci (default)
ucs2_icelandic_ci
ucs2_latvian_ci
ucs2_lithuanian_ci
ucs2_persian_ci
ucs2_polish_ci
ucs2_roman_ci
ucs2_romanian_ci
ucs2_slovak_ci
ucs2_slovenian_ci
ucs2_spanish2_ci
ucs2_spanish_ci
ucs2_swedish_ci
ucs2_turkish_ci
ucs2_unicode_ci
utf8 (UTF-8 Unicode) collations:
utf8_bin
utf8_czech_ci
utf8_danish_ci
utf8_estonian_ci
utf8_general_ci (default)
utf8_icelandic_ci
utf8_latvian_ci
utf8_lithuanian_ci
utf8_persian_ci
utf8_polish_ci
utf8_roman_ci
utf8_romanian_ci
utf8_slovak_ci
utf8_slovenian_ci
utf8_spanish2_ci
utf8_spanish_ci
utf8_swedish_ci
utf8_turkish_ci
utf8_unicode_ci
West European Character Sets cover most West European languages, such as French, Spanish, Catalan, Basque, Portuguese, Italian, Albanian, Dutch, German, Danish, Swedish, Norwegian, Finnish, Faroese, Icelandic, Irish, Scottish, and English.
ascii (US ASCII) collations:
ascii_bin
ascii_general_ci (default)
cp850 (DOS West European) collations:
cp850_bin
cp850_general_ci (default)
dec8 (DEC West European) collations:
dec8_bin
dec8_swedish_ci (default)
hp8 (HP West European) collations:
hp8_bin
hp8_english_ci (default)
latin1 (ISO 8859-1 West European) collations:
latin1_bin
latin1_danish_ci
latin1_general_ci
latin1_general_cs
latin1_german1_ci
latin1_german2_ci
latin1_spanish_ci
latin1_swedish_ci (default)
The latin1 is the default character set. The latin1_swedish_ci collation is the default that probably is used by the majority of MySQL customers. It is constantly stated that this is based on the Swedish/Finnish collation rules, but you will find Swedes and Finns who disagree with that statement.
The latin1_german1_ci and latin1_german2_ci collations are based on the DIN-1 and DIN-2 standards, where DIN stands for Deutsches Institut für Normung (that is, the German answer to ANSI). DIN-1 is called the dictionary collation and DIN-2 is called the phone-book collation.
latin1_german1_ci (dictionary) rules:
Ä = A Ö = O Ü = U ß = s
latin1_german2_ci (phone-book) rules:
Ä = AE Ö = OE Ü = UE ß = ss
In the latin1_spanish_ci collation, ‘Ñ’ (N-tilde) is a separate letter between ‘N’ and ‘O’.
macroman (Mac West European) collations:
macroman_bin
macroman_general_ci (default)
swe7 (7bit Swedish) collations:
swe7_bin
swe7_swedish_ci (default)
We have some support for character sets used in the Czech Republic, Slovakia, Hungary, Romania, Slovenia, Croatia, and Poland.
cp1250 (Windows Central European) collations:
cp1250_bin
cp1250_czech_ci
cp1250_general_ci (default)
cp852 (DOS Central European) collations:
cp852_bin
cp852_general_ci (default)
keybcs2 (DOS Kamenicky Czech-Slovak) collations:
keybcs2_bin
keybcs2_general_ci (default)
latin2 (ISO 8859-2 Central European) collations:
latin2_bin
latin2_croatian_ci
latin2_czech_ci
latin2_general_ci (default)
latin2_hungarian_ci
macce (Mac Central European) collations:
macce_bin
macce_general_ci (default)
armscii8 (ARMSCII-8 Armenian) collations:
armscii8_bin
armscii8_general_ci (default)
cp1256 (Windows Arabic) collations:
cp1256_bin
cp1256_general_ci (default)
geostd8 (GEOSTD8 Georgian) collations:
geostd8_bin
geostd8_general_ci (default)
greek (ISO 8859-7 Greek) collations:
greek_bin
greek_general_ci (default)
hebrew (ISO 8859-8 Hebrew) collations:
hebrew_bin
hebrew_general_ci (default)
latin5 (ISO 8859-9 Turkish) collations:
latin5_bin
latin5_turkish_ci (default)
The Baltic character sets cover Estonian, Latvian, and Lithuanian languages. There are two Baltic character sets currently supported:
cp1257 (Windows Baltic) collations:
cp1257_bin
cp1257_general_ci (default)
cp1257_lithuanian_ci
latin7 (ISO 8859-13 Baltic) collations:
latin7_bin
latin7_estonian_cs
latin7_general_ci (default)
latin7_general_cs
Here are the Cyrillic character sets and collations for use with Belarusian, Bulgarian, Russian, and Ukrainian languages.
cp1251 (Windows Cyrillic) collations:
cp1251_bin
cp1251_bulgarian_ci
cp1251_general_ci (default)
cp1251_general_cs
cp1251_ukrainian_ci
cp866 (DOS Russian) collations:
cp866_bin
cp866_general_ci (default)
koi8r (KOI8-R Relcom Russian) collations:
koi8r_bin
koi8r_general_ci (default)
koi8u (KOI8-U Ukrainian) collations:
koi8u_bin
koi8u_general_ci (default)
The Asian character sets that we support include Chinese, Japanese, Korean, and Thai. These can be complicated. For example, the Chinese sets must allow for thousands of different characters.
big5 (Big5 Traditional Chinese) collations:
big5_bin
big5_chinese_ci (default)
euckr (EUC-KR Korean) collations:
euckr_bin
euckr_korean_ci (default)
gb2312 (GB2312 Simplified Chinese) collations:
gb2312_bin
gb2312_chinese_ci (default)
gbk (GBK Simplified Chinese) collations:
gbk_bin
gbk_chinese_ci (default)
sjis (Shift-JIS Japanese) collations:
sjis_bin
sjis_japanese_ci (default)
tis620 (TIS620 Thai) collations:
tis620_bin
tis620_thai_ci (default)
ujis (EUC-JP Japanese) collations:
ujis_bin
ujis_japanese_ci (default)