The ROLLUP operation: adding summary information to GROUP BY queries

Performing set operations on query results with UNION, INTERSECT, and EXCEPT

ASA SQL User's Guide
Summarizing, Grouping and Sorting Query Results

The ROLLUP operation: adding summary information to GROUP BY queries

The ROLLUP operation is a simple online analytical processing feature. The term OnLine Analytical Processing (OLAP) describes a large set of data analysis features, of which ROLLUP is one.

ROLLUP adds subtotal rows into the result sets of queries with GROUP BY clauses.

ROLLUP example

The following query illustrates a ROLLUP operation and its result set. The subtotal rows are highlighted in the result set. Each subtotal row has a NULL in the column or columns over which the subtotal is computed. One row is an aggregate over all rows, and has NULL in both Year and Quarter columns.

SELECT year(order_date) Year,
       quarter( order_date) Quarter,
       count(*) Orders
FROM sales_order
GROUP BY ROLLUP (Year, Quarter)
ORDER BY Year, Quarter

Year	Quarter	Orders
(NULL)	(NULL)	648
2000	(NULL)	380
2000	1	87
2000	2	77
2000	3	91
2000	4	125
2001	(NULL)	268
2001	1	139
2001	2	119
2001	3	10

Understanding subtotal rows

Much of the purpose of ROLLUP operations can be understood from the example above or by running your own examples. This section gives a formal description of ROLLUP operations for those cases where examples are not sufficient.

ROLLUP is equivalent to a UNION of a set of GROUP BY queries. The result sets of the following queries are identical:

-- A ROLLUP query
SELECT A, B, C , SUM( D )
FROM T1
GROUP BY ROLLUP A, B, C;

-- Equivalent query without ROLLUP
SELECT * FROM (
 ( SELECT A, B, C, SUM( D )
   GROUP BY A, B, C )
  UNION ALL
 ( SELECT A, B, NULL, SUM( D )
   GROUP BY A, B )
  UNION ALL
 ( SELECT A, NULL, NULL, SUM( D )
   GROUP BY A )
  UNION ALL
 ( SELECT NULL, NULL, NULL, SUM( D ) )
)

The result set of GROUP BY A, B consists of subtotals over all those rows in which A and B are held constant. To make a union possible, column C is assigned NULL.

The ROLLUP operation can be thought of as follows:

A list of prefixes is constructed for the query. A prefix is a subset of the items in the group-by-list.

A prefix is constructed by excluding one or more rightmost items from those in the query's group-by-list. The remaining columns are called the prefix columns. For example, in the example above the group-by-list includes two items (Year and Quarter).
```
GROUP BY ROLLUP (Year, Quarter)
```
There are two prefixes for this query:
- Exclude Quarter. The set of prefix columns contains the single column Year.
- Exclude both Quarter and Year. There are no prefix columns.
There are the same number of prefixes as there are items in the group-by-list.
For each prefix, a subtotal row is constructed corresponding to all those rows in which the prefix columns have the same value.

For example, in the query above, the prefix containing the Year column leads to a summary row for Year=2000 and a summary row for Year=2001. There is a single summary row for the prefix that has no columns, which is a subtotal over all rows in the intermediate result set.

The value of each column in a subtotal row is as follows:
- Column included in the prefix The value of the column. For example, in the query above, the value of the Year column for the subtotal over rows with Year=2000 is 2000.
- Column excluded from the prefix NULL. For example, the Quarter column has a value of NULL for the subtotal rows generated by the prefix consisting of the Year column.
- Aggregate function An aggregate over the values of the excluded columns.
  Subtotal values are computed over the rows in the underlying data, not over the aggregated rows. In many cases, such as SUM or COUNT, the result is the same, but the distinction is important in the case of statistical functions such as AVG, STDDEV, and VARIANCE, for which the result differs.

The ROLLUP operation and NULL

When rows in the input to a GROUP BY operation contain NULL, there is the possibility of confusion between subtotal rows added by the ROLLUP operation and NULL-containing rows that are part of the GROUP BY result set.

The GROUPING function distinguishes subtotal rows from others. The GROUPING function takes a column of the result set as its argument, and returns 1 if the column is NULL because the row is a subtotal row, and 0 otherwise.

The following example includes GROUPING columns in the result set. The query is an outer join between the employee table and the sales_order table. NULL appears in the columns corresponding to those employees who are not sales representatives (and therefore have no sales).

SELECT year(order_date) Year,
       employee.emp_id Employee,
       count(*) Orders,
       grouping( Year ) as GY,
       grouping( Employee ) as GE
FROM employee left outer join sales_order
ON employee.emp_id = sales_order.sales_rep
GROUP BY ROLLUP (Year, Employee)
ORDER BY Year, Employee

Year	Employee	Orders	GY	GE
(NULL)	(NULL)	64	0	1
(NULL)	(NULL)	712	1	1
(NULL)	102	1	0	0
...	...	...	...	...

For more information, see GROUPING function [Aggregate].