Functions and Modifiers for Use with GROUP BY Clauses

GROUP BY (Aggregate) Functions

If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.

AVG(expr)

Returns the average value of expr.

mysql> SELECT student_name, AVG(test_score)
    ->        FROM student
    ->        GROUP BY student_name;
BIT_AND(expr)

Returns the bitwise AND of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision. As of MySQL 4.0.17, this function returns 18446744073709551615 if there were no matching rows. (This is an unsigned BIGINT value with all bits set to 1.) Before 4.0.17, the function returns -1 if there were no matching rows.

BIT_OR(expr)

Returns the bitwise OR of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision. This function returns 0 if there were no matching rows.

BIT_XOR(expr)

Returns the bitwise XOR of all bits in expr. The calculation is performed with 64-bit (BIGINT) precision. This function returns 0 if there were no matching rows.

This function is available as of MySQL 4.1.1.

COUNT(expr)

Returns a count of the number of non-NULL values in the rows retrieved by a SELECT statement.

mysql> SELECT student.student_name,COUNT(*)
    ->        FROM student,course
    ->        WHERE student.student_id=course.student_id
    ->        GROUP BY student_name;

COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain NULL values.

COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause. For example:

mysql> SELECT COUNT(*) FROM student;

This optimization applies only to MyISAM and ISAM tables only, because an exact record count is stored for these table types and can be accessed very quickly. For transactional storage engines (InnoDB, BDB), storing an exact row count is more problematic because multiple transactions may be occurring, each of which may affect the count.

COUNT(DISTINCT expr,[expr...])

Returns a count of the number of different non-NULL values.

mysql> SELECT COUNT(DISTINCT results) FROM student;

In MySQL, you can get the number of distinct expression combinations that don't contain NULL by giving a list of expressions. In standard SQL, you would have to do a concatenation of all expressions inside COUNT(DISTINCT ...).

COUNT(DISTINCT ...) was added in MySQL 3.23.2.

GROUP_CONCAT(expr)

This function returns a string result with the concatenated values from a group. The full syntax is as follows:

GROUP_CONCAT([DISTINCT] expr [,expr ...]
             [ORDER BY {unsigned_integer | col_name | expr}
                 [ASC | DESC] [,col_name ...]]
             [SEPARATOR str_val])
mysql> SELECT student_name,
    ->     GROUP_CONCAT(test_score)
    ->     FROM student
    ->     GROUP BY student_name;

Or:

mysql> SELECT student_name,
    ->     GROUP_CONCAT(DISTINCT test_score
    ->               ORDER BY test_score DESC SEPARATOR ' ')
    ->     FROM student
    ->     GROUP BY student_name;

In MySQL, you can get the concatenated values of expression combinations. You can eliminate duplicate values by using DISTINCT. If you want to sort values in the result, you should use ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword. SEPARATOR is followed by the string value that should be inserted between values of result. The default is a comma (‘,’). You can remove the separator altogether by specifying SEPARATOR ''.

You can set a maximum allowed length with the group_concat_max_len system variable. The syntax to do this at runtime is as follows, where val is an unsigned integer:

SET [SESSION | GLOBAL] group_concat_max_len = val;

If a maximum length has been set, the result is truncated to this maximum length.

GROUP_CONCAT() was added in MySQL 4.1.

Note: Before MySQL 4.1.6, there are some small limitations with GROUP_CONCAT() for BLOB and TEXT values when it comes to using DISTINCT together with ORDER BY. To work around this limitation, use MID(expr,1,255) instead.

MIN(expr), MAX(expr),

Returns the minimum or maximum value of expr. MIN() and MAX() may take a string argument; in such cases they return the minimum or maximum string value. See the section called “How MySQL Uses Indexes”.

mysql> SELECT student_name, MIN(test_score), MAX(test_score)
    ->        FROM student
    ->        GROUP BY student_name;

For MIN(), MAX(), and other aggregate functions, MySQL currently compares ENUM and SET columns by their string value rather than by the string's relative position in the set. This differs from how ORDER BY compares them. This will be rectified.

STD(expr), STDDEV(expr),

Returns the standard deviation of expr (the square root of VARIANCE()). This is an extension to standard SQL. The STDDEV() form of this function is provided for Oracle compatibility.

SUM(expr)

Returns the sum of expr. If the return set has no rows, SUM() returns NULL.

VARIANCE(expr)

Returns the standard variance of expr (considering rows as the whole population, not as a sample; so it has the number of rows as denominator). This is an extension to standard SQL, available only in MySQL 4.1 or later.

GROUP BY Modifiers

As of MySQL 4.1.1, the GROUP BY clause allows a WITH ROLLUP modifier that causes extra rows to be added to the summary output. These rows represent higher-level (or super-aggregate) summary operations. ROLLUP thus allows you to answer questions at multiple levels of analysis with a single query. It can be used, for example, to provide support for OLAP (Online Analytical Processing) operations.

As an illustration, suppose that a table named sales has year, country, product, and profit columns for recording sales profitability:

CREATE TABLE sales
(
    year    INT NOT NULL,
    country VARCHAR(20) NOT NULL,
    product VARCHAR(32) NOT NULL,
    profit  INT
);

The table's contents can be summarized per year with a simple GROUP BY like this:

mysql> SELECT year, SUM(profit) FROM sales GROUP BY year;
+------+-------------+
| year | SUM(profit) |
+------+-------------+
| 2000 |        4525 |
| 2001 |        3010 |
+------+-------------+

This output shows the total profit for each year, but if you also want to determine the total profit summed over all years, you must add up the individual values yourself or run an additional query.

Or you can use ROLLUP, which provides both levels of analysis with a single query. Adding a WITH ROLLUP modifier to the GROUP BY clause causes the query to produce another row that shows the grand total over all year values:

mysql> SELECT year, SUM(profit) FROM sales GROUP BY year WITH ROLLUP;
+------+-------------+
| year | SUM(profit) |
+------+-------------+
| 2000 |        4525 |
| 2001 |        3010 |
| NULL |        7535 |
+------+-------------+

The grand total super-aggregate line is identified by the value NULL in the year column.

ROLLUP has a more complex effect when there are multiple GROUP BY columns. In this case, each time there is a “break” (change in value) in any but the last grouping column, the query produces an extra super-aggregate summary row.

For example, without ROLLUP, a summary on the sales table based on year, country, and product might look like this:

mysql> SELECT year, country, product, SUM(profit)
    -> FROM sales
    -> GROUP BY year, country, product;
+------+---------+------------+-------------+
| year | country | product    | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer   |        1500 |
| 2000 | Finland | Phone      |         100 |
| 2000 | India   | Calculator |         150 |
| 2000 | India   | Computer   |        1200 |
| 2000 | USA     | Calculator |          75 |
| 2000 | USA     | Computer   |        1500 |
| 2001 | Finland | Phone      |          10 |
| 2001 | USA     | Calculator |          50 |
| 2001 | USA     | Computer   |        2700 |
| 2001 | USA     | TV         |         250 |
+------+---------+------------+-------------+

The output indicates summary values only at the year/country/product level of analysis. When ROLLUP is added, the query produces several extra rows:

mysql> SELECT year, country, product, SUM(profit)
    -> FROM sales
    -> GROUP BY year, country, product WITH ROLLUP;
+------+---------+------------+-------------+
| year | country | product    | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer   |        1500 |
| 2000 | Finland | Phone      |         100 |
| 2000 | Finland | NULL       |        1600 |
| 2000 | India   | Calculator |         150 |
| 2000 | India   | Computer   |        1200 |
| 2000 | India   | NULL       |        1350 |
| 2000 | USA     | Calculator |          75 |
| 2000 | USA     | Computer   |        1500 |
| 2000 | USA     | NULL       |        1575 |
| 2000 | NULL    | NULL       |        4525 |
| 2001 | Finland | Phone      |          10 |
| 2001 | Finland | NULL       |          10 |
| 2001 | USA     | Calculator |          50 |
| 2001 | USA     | Computer   |        2700 |
| 2001 | USA     | TV         |         250 |
| 2001 | USA     | NULL       |        3000 |
| 2001 | NULL    | NULL       |        3010 |
| NULL | NULL    | NULL       |        7535 |
+------+---------+------------+-------------+

For this query, adding ROLLUP causes the output to include summary information at four levels of analysis, not just one. Here's how to interpret the ROLLUP output:

  • Following each set of product rows for a given year and country, an extra summary row is produced showing the total for all products. These rows have the product column set to NULL.

  • Following each set of rows for a given year, an extra summary row is produced showing the total for all countries and products. These rows have the country and products columns set to NULL.

  • Finally, following all other rows, an extra summary row is produced showing the grand total for all years, countries, and products. This row has the year, country, and products columns set to NULL.

Other Considerations When using ROLLUP

The following items list some behaviors specific to the MySQL implementation of ROLLUP:

When you use ROLLUP, you cannot also use an ORDER BY clause to sort the results. In other words, ROLLUP and ORDER BY are mutually exclusive. However, you still have some control over sort order. GROUP BY in MySQL sorts results, and you can use explicit ASC and DESC keywords with columns named in the GROUP BY list to specify sort order for individual columns. (The higher-level summary rows added by ROLLUP still appear after the rows from which they are calculated, regardless of the sort order.)

LIMIT can be used to restrict the number of rows returned to the client. LIMIT is applied after ROLLUP, so the limit applies against the extra rows added by ROLLUP. For example:

mysql> SELECT year, country, product, SUM(profit)
    -> FROM sales
    -> GROUP BY year, country, product WITH ROLLUP
    -> LIMIT 5;
+------+---------+------------+-------------+
| year | country | product    | SUM(profit) |
+------+---------+------------+-------------+
| 2000 | Finland | Computer   |        1500 |
| 2000 | Finland | Phone      |         100 |
| 2000 | Finland | NULL       |        1600 |
| 2000 | India   | Calculator |         150 |
| 2000 | India   | Computer   |        1200 |
+------+---------+------------+-------------+

Using LIMIT with ROLLUP may produce results that are more difficult to interpret, because you have less context for understanding the super-aggregate rows.

The NULL indicators in each super-aggregate row are produced when the row is sent to the client. The server looks at the columns named in the GROUP BY clause following the leftmost one that has changed value. For any column in the result set with a name that is a lexical match to any of those names, its value is set to NULL. (If you specify grouping columns by column number, the server identifies which columns to set to NULL by number.)

Because the NULL values in the super-aggregate rows are placed into the result set at such a late stage in query processing, you cannot test them as NULL values within the query itself. For example, you cannot add HAVING product IS NULL to the query to eliminate from the output all but the super-aggregate rows.

On the other hand, the NULL values do appear as NULL on the client side and can be tested as such using any MySQL client programming interface.

GROUP BY with Hidden Fields

MySQL extends the use of GROUP BY so that you can use columns or calculations in the SELECT list that don't appear in the GROUP BY clause. This stands for any possible value for this group. You can use this to get better performance by avoiding sorting and grouping on unnecessary items. For example, you don't need to group on customer.name in the following query:

mysql> SELECT order.custid, customer.name, MAX(payments)
    ->        FROM order,customer
    ->        WHERE order.custid = customer.custid
    ->        GROUP BY order.custid;

In standard SQL, you would have to add customer.name to the GROUP BY clause. In MySQL, the name is redundant if you don't run in ANSI mode.

Do not use this feature if the columns you omit from the GROUP BY part are not unique in the group! You will get unpredictable results.

In some cases, you can use MIN() and MAX() to obtain a specific column value even if it isn't unique. The following gives the value of column from the row containing the smallest value in the sort column:

SUBSTR(MIN(CONCAT(RPAD(sort,6,' '),column)),7)

See the section called “The Rows Holding the Group-wise Maximum of a Certain Field”.

Note that if you are using MySQL 3.22 (or earlier) or if you are trying to follow standard SQL, you can't use expressions in GROUP BY or ORDER BY clauses. You can work around this limitation by using an alias for the expression:

mysql> SELECT id,FLOOR(value/100) AS val FROM tbl_name
    ->     GROUP BY id, val ORDER BY val;

In MySQL 3.23 and up, aliases are unnecessary. You can use expressions in GROUP BY and ORDER BY clauses. For example:

mysql> SELECT id, FLOOR(value/100) FROM tbl_name ORDER BY RAND();