Indexed distinct

ASA SQL User's Guide
  Query Optimization and Execution
    Query execution algorithms
      Duplicate elimination

Indexed distinct

The indexed distinct algorithm maintains a work table of the unique rows from the input. As rows are read from the input, an index on the work table is searched to find a previously seen duplicate of the input row. If one is found, the input row is ignored. Otherwise, the input row is inserted into the work table. The work table index is created on all the columns of the SELECT list; in order to improve index performance, a hash expression is included as the first expression. This hash expression is a computed value embodying the values of all the columns in the SELECT list.

The indexed distinct method returns distinct rows as they are encountered. This allows it to return the first few rows quickly compared to other duplicate elimination methods. The indexed distinct algorithm only stores two rows in memory at a time, and can work well in extremely low memory situations. However, if the number of distinct rows is large, the execution cost of the indexed distinct algorithm is typically worse than hash distinct. The work table used to store distinct rows may not fit in cache, leading to rereading work table pages many times in a random access pattern.

Since the indexed distinct method uses a work table, it cannot provide fully sensitive semantics; however, it also does not provide fully insensitive semantics, and another work table is required for insensitive cursors.

The indexed distinct method locks the rows of its input.