String Types

The string types are CHAR, VARCHAR, BLOB, TEXT, ENUM, and SET. This section describes how these types work and how to use them in your queries.

The CHAR and VARCHAR Types

The CHAR and VARCHAR types are similar, but differ in the way they are stored and retrieved.

The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255. (Before MySQL 3.23, the length of CHAR may be from 1 to 255.) When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed.

Values in VARCHAR columns are variable-length strings. You can declare a VARCHAR column to be any length from 0 to 255, just as for CHAR columns. (Before MySQL 4.0.2, the length of VARCHAR may be from 1 to 255.) However, in contrast to CHAR, VARCHAR values are stored using only as many characters as are needed, plus one byte to record the length. Values are not padded; instead, trailing spaces are removed when values are stored. This space removal differs from the standard SQL specification.

No lettercase conversion takes place during storage or retrieval.

If you assign a value to a CHAR or VARCHAR column that exceeds the column's maximum length, the value is truncated to fit.

If you need a column for which trailing spaces are not removed, consider using a BLOB or TEXT type. If you want to store binary values such as results from an encryption or compression function that might contain arbitrary byte values, use a BLOB column rather than a CHAR or VARCHAR column to avoid potential problems with trailing space removal that would change data values.

The following table illustrates the differences between the two types of columns by showing the result of storing various string values into CHAR(4) and VARCHAR(4) columns:

ValueCHAR(4)Storage RequiredVARCHAR(4)Storage Required
''' '4 bytes''1 byte
'ab''ab '4 bytes'ab'3 bytes
'abcd''abcd'4 bytes'abcd'5 bytes
'abcdefgh''abcd'4 bytes'abcd'5 bytes

The values retrieved from the CHAR(4) and VARCHAR(4) columns are the same in each case, because trailing spaces are removed from CHAR columns upon retrieval.

As of MySQL 4.1, values in CHAR and VARCHAR columns are sorted and compared according to the collation of the character set assigned to the column. Before MySQL 4.1, sorting and comparison are based on the collation of the server character set; you can declare the column with the BINARY attribute to cause sorting and comparison to be case sensitive using the underlying character code values rather then a lexical ordering. BINARY doesn't affect how the column is stored or retrieved.

From MySQL 4.1.0 on, column type CHAR BYTE is an alias for CHAR BINARY. This is a compatibility feature.

The BINARY attribute is sticky. This means that if a column marked BINARY is used in an expression, the whole expression is treated as a BINARY value.

From MySQL 4.1.0 on, the ASCII attribute can be specified for CHAR. It assigns the latin1 character set.

From MySQL 4.1.1 on, the UNICODE attribute can be specified for CHAR. It assigns the ucs2 character set.

MySQL may silently change the type of a CHAR or VARCHAR column at table creation time. See the section called “Silent Column Specification Changes”.

The BLOB and TEXT Types

A BLOB is a binary large object that can hold a variable amount of data. The four BLOB types, TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB, differ only in the maximum length of the values they can hold. See the section called “Column Type Storage Requirements”.

The four TEXT types, TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT, correspond to the four BLOB types and have the same maximum lengths and storage requirements.

BLOB columns are treated as binary strings, whereas TEXT columns are treated according to their character set. For BLOB columns, sorting and comparison is case sensitive. For TEXT columns, values are sorted and compared based on the collation of the character set assigned to the column as of MySQL 4.1. Before 4.1, TEXT sorting and comparison are based on the collation of the server character set.

No lettercase conversion takes place during storage or retrieval.

If you assign a value to a BLOB or TEXT column that exceeds the column type's maximum length, the value is truncated to fit.

In most respects, you can regard a TEXT column as a VARCHAR column that can be as big as you like. Similarly, you can regard a BLOB column as a VARCHAR BINARY column. The ways in which BLOB and TEXT differ from CHAR and VARCHAR are:

  • There is no trailing-space removal for BLOB and TEXT columns when values are stored or retrieved. This differs from CHAR columns (trailing spaces are removed when values are retrieved) and from VARCHAR columns (trailing spaces are removed when values are stored).

  • You can have indexes on BLOB and TEXT columns only as of MySQL 3.23.2 for MyISAM tables or MySQL 4.0.14 for InnoDB tables. Older versions of MySQL did not support indexing these column types.

  • For indexes on BLOB and TEXT columns, you must specify an index prefix length. For CHAR and VARCHAR, a prefix length is optional.

  • BLOB and TEXT columns cannot have DEFAULT values.

From MySQL 4.1.0 on, LONG and LONG VARCHAR map to the MEDIUMTEXT data type. This is a compatibility feature.

MySQL Connector/ODBC defines BLOB values as LONGVARBINARY and TEXT values as LONGVARCHAR.

Because BLOB and TEXT values may be extremely long, you may encounter some constraints in using them:

  • Only the first max_sort_length bytes of the column are used when sorting. The default value of max_sort_length is 1024; this value can be changed using the --max_sort_length option when starting the mysqld server. See the section called “Server System Variables”.

    If you want to use GROUP BY or ORDER BY on a BLOB or TEXT column containing long values and you want more than max_sort_length bytes to be significant, you can convert the column value into a fixed-length object. The standard way to do this is with the SUBSTRING function. For example, the following statement causes 2000 bytes of the comment column to be taken into account for sorting:

    mysql> SELECT id, SUBSTRING(comment,1,2000) FROM tbl_name
        -> ORDER BY SUBSTRING(comment,1,2000);
    

    Before MySQL 3.23.2, you can group on an expression involving BLOB or TEXT values by using a column alias or by specifying the column position:

    mysql> SELECT id, SUBSTRING(comment,1,2000) AS b
        -> FROM tbl_name GROUP BY b;
    mysql> SELECT id, SUBSTRING(comment,1,2000)
        -> FROM tbl_name GROUP BY 2;
    

    As of MySQL 4.0.3, another way to make more bytes signficant in sorting is to increase the value of max_sort_length at runtime. Any client can change the value of its session max_sort_length variable:

    mysql> SET max_sort_length = 2000;
    mysql> SELECT id, comment FROM tbl_name
        -> ORDER BY comment;
    
  • The maximum size of a BLOB or TEXT object is determined by its type, but the largest value you actually can transmit between the client and server is determined by the amount of available memory and the size of the communications buffers. You can change the message buffer size by changing the value of the max_allowed_packet variable, but you must do so for both the server and your client program. For example, both mysql and mysqldump allow you to change the client-side max_allowed_packet value. See the section called “Tuning Server Parameters”, mysql, and mysqldump.

Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other column types, for which storage is allocated once per column when the table is opened.

The ENUM Type

An ENUM is a string object with a value chosen from a list of allowed values that are enumerated explicitly in the column specification at table creation time.

The value may also be the empty string ('') or NULL under certain circumstances:

  • If you insert an invalid value into an ENUM (that is, a string not present in the list of allowed values), the empty string is inserted instead as a special error value. This string can be distinguished from a “normal” empty string by the fact that this string has the numerical value 0. More about this later.

  • If an ENUM column is declared to allow NULL, the NULL value is a legal value for the column, and the default value is NULL. If an ENUM column is declared NOT NULL, its default value is the first element of the list of allowed values.

Each enumeration value has an index:

  • Values from the list of allowable elements in the column specification are numbered beginning with 1.

  • The index value of the empty string error value is 0. This means that you can use the following SELECT statement to find rows into which invalid ENUM values were assigned:

    mysql> SELECT * FROM tbl_name WHERE enum_col=0;
    
  • The index of the NULL value is NULL.

For example, a column specified as ENUM('one', 'two', 'three') can have any of the values shown here. The index of each value is also shown:

ValueIndex
NULLNULL
''0
'one'1
'two'2
'three'3

An enumeration can have a maximum of 65,535 elements.

Starting from MySQL 3.23.51, trailing spaces are automatically deleted from ENUM member values when the table is created.

Lettercase is irrelevant when you assign values to an ENUM column. However, values retrieved from the column later are displayed using the lettercase that was used in the column definition.

If you retrieve an ENUM value in a numeric context, the column value's index is returned. For example, you can retrieve numeric values from an ENUM column like this:

mysql> SELECT enum_col+0 FROM tbl_name;

If you store a number into an ENUM column, the number is treated as an index, and the value stored is the enumeration member with that index. (However, this will not work with LOAD DATA, which treats all input as strings.) It's not advisable to define an ENUM column with enumeration values that look like numbers, because this can easily become confusing. For example, the following column has enumeration members with string values of '0', '1', and '2', but numeric index values of 1, 2, and 3:

numbers ENUM('0','1','2')

ENUM values are sorted according to the order in which the enumeration members were listed in the column specification. (In other words, ENUM values are sorted according to their index numbers.) For example, 'a' sorts before 'b' for ENUM('a', 'b'), but 'b' sorts before 'a' for ENUM('b', 'a'). The empty string sorts before non-empty strings, and NULL values sort before all other enumeration values. To prevent unexpected results, specify the ENUM list in alphabetical order. You can also use GROUP BY CAST(col AS VARCHAR) or GROUP BY CONCAT(col) to make sure that the column is sorted lexically rather than by index number.

If you want to determine all possible values for an ENUM column, use SHOW COLUMNS FROM tbl_name LIKE enum_col and parse the ENUM definition in the second column of the output.

The SET Type

A SET is a string object that can have zero or more values, each of which must be chosen from a list of allowed values specified when the table is created. SET column values that consist of multiple set members are specified with members separated by commas (‘,’). A consequence of this is that SET member values cannot themselves contain commas.

For example, a column specified as SET('one', 'two') NOT NULL can have any of these values:

''
'one'
'two'
'one,two'

A SET can have a maximum of 64 different members.

Starting from MySQL 3.23.51, trailing spaces are automatically deleted from SET member values when the table is created.

MySQL stores SET values numerically, with the low-order bit of the stored value corresponding to the first set member. If you retrieve a SET value in a numeric context, the value retrieved has bits set corresponding to the set members that make up the column value. For example, you can retrieve numeric values from a SET column like this:

mysql> SELECT set_col+0 FROM tbl_name;

If a number is stored into a SET column, the bits that are set in the binary representation of the number determine the set members in the column value. For a column specified as SET('a','b','c','d'), the members have the following decimal and binary values:

SETMemberDecimal ValueBinary Value
'a'10001
'b'20010
'c'40100
'd'81000

If you assign a value of 9 to this column, that is 1001 in binary, so the first and fourth SET value members 'a' and 'd' are selected and the resulting value is 'a,d'.

For a value containing more than one SET element, it does not matter what order the elements are listed in when you insert the value. It also does not matter how many times a given element is listed in the value. When the value is retrieved later, each element in the value will appear once, with elements listed according to the order in which they were specified at table creation time. If a column is specified as SET('a','b','c','d'), then 'a,d', 'd,a', and 'd,a,a,d,d' all will appear as 'a,d' when retrieved.

If you set a SET column to an unsupported value, the value will be ignored.

SET values are sorted numerically. NULL values sort before non-NULL SET values.

Normally, you search for SET values using the FIND_IN_SET() function or the LIKE operator:

mysql> SELECT * FROM tbl_name WHERE FIND_IN_SET('value',set_col)>0;
mysql> SELECT * FROM tbl_name WHERE set_col LIKE '%value%';

The first statement finds rows where set_col contains the value set member. The second is similar, but not the same: It finds rows where set_col contains value anywhere, even as a substring of another set member.

The following statements also are legal:

mysql> SELECT * FROM tbl_name WHERE set_col & 1;
mysql> SELECT * FROM tbl_name WHERE set_col = 'val1,val2';

The first of these statements looks for values containing the first set member. The second looks for an exact match. Be careful with comparisons of the second type. Comparing set values to 'val1,val2' will return different results than comparing values to 'val2,val1'. You should specify the values in the same order they are listed in the column definition.

If you want to determine all possible values for a SET column, use SHOW COLUMNS FROM tbl_name LIKE set_col and parse the SET definition in the second column of the output.