Optimizer estimates

ASA SQL User's Guide
Query Optimization and Execution
How the optimizer works

Optimizer estimates

The optimizer chooses a strategy for processing a statement based on histograms that are stored in the database and heuristics (educated guesses).

Histograms, also called column statistics, store information about the distribution of values in a column. In Adaptive Server Anywhere, a histogram represents the data distribution for a column by dividing the domain of the column into a set of consecutive value ranges (also called buckets) and by remembering, for each value range (or bucket), the number of rows in the table for which the column value falls in the bucket.

Adaptive Server Anywhere pays particular attention to single column values that are represented in a large number of rows in the table. Significant single value selectivities are maintained in singleton histogram buckets (for example, buckets that encompass a single value in the column domain). ASA tries to maintain a minimum number of singleton buckets in each histogram, usually between 10 and 100 depending upon the size of the table. Additionally, all single values with selectivities greater than 1% are kept as singleton buckets. As a result, a histogram for a given column remembers the top N single value selectivities for the column where the value of N is dependent upon the size of the table and the number of single value selectivities that are greater than 1%.

Once the minimum number of value ranges have been met, low-selectivity frequencies are replaced by large-selectivity frequencies as they come along. The histogram will only have more than the minimum number of singleton value ranges after it has seen enough values with a selectivity of greater than 1%.

Given the histogram on a column, Adaptive Server Anywhere attempts to estimate the number of rows satisfying a given query predicate on the column by adding up the number of rows in all value ranges that overlap the values satisfying the specified predicate. For value ranges in the histograms that are partially contained in the query result set, Adaptive Server Anywhere uses interpolation within the value range.

Adaptive Server Anywhere uses an implementation of histograms that causes the histograms to be more refined as a byproduct of query execution. As queries are executed, Adaptive Server Anywhere compares the number of rows estimated by the histograms for a given predicate with the number of rows actually found to satisfy the predicate, and then adjusts the values in the histogram to reduce the margin of error for subsequent optimizations.

For each table in a potential execution plan, the optimizer estimates the number of rows that will form part of the results. The number of rows depends on the size of the table and the restrictions in the WHERE clause or the ON clause of the query.

In many cases, the optimizer uses more sophisticated heuristics. For example, the optimizer uses default estimates only in cases where better statistics are unavailable. As well, the optimizer makes use of indexes and keys to improve its guess of the number of rows. The following are a few single-column examples:

Equating a column to a value: estimate one row when the column has a unique index or is the primary key.
A comparison of an indexed column to a constant: probe the index to estimate the percentage of rows that satisfy the comparison.
Equating a foreign key to a primary key (key join): use relative table sizes in determining an estimate. For example, if a 5000 row table has a foreign key to a 1000 row table, the optimizer guesses that there are five foreign key rows for each primary key row.

For more information about column statistics, see SYSCOLSTAT system table.

For information about obtaining the selectivities of predicates and the distribution of column values, see:

Updating column statistics