Many questions repeatedly asked in Forums and mailing lists are answered in this guide. If you want to use HSQLDB with your application, you should read this guide.
Before 1.7.0, a CONSTRAINT <name> PRIMARY KEY
was translated
internally to a unique index and, in addition, a hidden column was added to
the table with an extra unique index. From 1.7.0 both single-column and multi-column
PRIMARY KEY constraints are supported. They are supported by a unique index
on the primary key column(s) specified and no extra hidden column is maintained
for these indexes.
According to the SQL standards, a unique constraint on a single column means no two values are equal unless one of them is NULL. This means you can have one or more rows where the column value is NULL.
A unique constraint on multiple columns (c1, c2, c3, ..) means that no two sets of values for the columns are equal unless at lease one of them is NULL. Each signle column taken by itself can have repeat values. The following example satisfies a UNIQUE constraint on the two columns:
1,1
1,2
2,1
2,2
NULL,1
NULL,1
NULL,NULL
1, NULL
In version 1.7.2 the behaviour of UNIQUE constraints and indexes with respect to NULL values has changed to conform to SQL standards. A row, in which the value for any of the UNIQUE constraint columns is NULL, can always be added to the table. So multiple rows can contain the same values for the UNIQUE columns if one of the values is NULL.
In 1.7.2, user defined UNIQUE indexes can still be declared but they are deprecated. You should use a UNIQUE constraint instead.
CONSTRAINT <name> UNIQUE
always creates internally a unique index on the columns, as with previous
versions, so it has exactly the same effect as the deprecated UNIQUE index declaration.
From version 1.7.0, HSQLDB features single and multiple column foreign keys. A foreign key can also be specified to reference a target table without naming the target column(s). In this case the primary key column(s) of the target table is used as the referenced column(s). Each pair of referencing and referenced columns in any foreign key should be of identical type. When a foreign key is declared, a unique constraint (or primary key) must exist on the referenced columns in the primary key table. A non-unique index is automatically created on the referencing columns. For example:
CREATE TABLE child(c1 INTEGER, c2 VARCHAR, FOREIGN KEY (c1, c2) REFERENCES parent(p1, p2));
There must be a UNIQUE constraint on columns (p1,p2)
in the table named "parent". A non-unique index is automatically created on
columns (c1, c2)
in the table named "child". Columns p1
and c1
must be of the same type (INTEGER). Columns p2
and c2
must be of the same type (VARCHAR).
HSQLDB does not use indexes to improve sorting of query results. But indexes have a crucial role in improving query speed. If no index is used in a query on a single table, such as a DELETE query, then all the rows of the table must be examined. With an index on one of the columns that is in the WHERE clause, it is often possible to start directly from the first candidate row and reduce the number of rows that are examined.
Indexes are even more important in joins between multiple tables. SELECT ... FROM t1 JOIN t2 ON t1.c1 = t2.c2
is performed by taking rows of t1 one by one and finding a matching row in t2. If there is no index index on t2.c2 then for each row of t1, all the rows of t2 must be checked. Whereas with an index, a matching row can be found in a fraction of the time. If the query also has a condition on t1, e.g., SELECT ... FROM t1 JOIN t2 ON t1.c1 = t2.c2 WHERE t1.c3 = 4
then an index on t1.c3 would eliminate the need for checking all the rows of t1 one by one, and will reduced query time to less than a millisecond per returned row. So if t1 and t2 each contain 10,000 rows, the query without indexes involves checking 100,000,000 row combinations. With an index on t2.c2, this is reduced to 10,000 row checks and index lookups. With the additional index on t2.c2, only about 4 rows are checked to get the first result row.
Indexes are automatically created for primary key and unique columns. Otherwise you should define an index using the CREATE INDEX command.
Note that in HSQLDB a unique index on multiple columns can be used internally as
a non-unique index on the first column in the list. For example: CONSTRAINT
name1 UNIQUE (c1, c2, c3);
means there is the equivalent
of CREATE INDEX name2 ON atable(c1);
So you do not need to specify
an extra index if you require one on the first column of the list.
In 1.7.2, a multi-column index will speed up queries that contain joins or values on ALL the columns. You need NOT declare additional individual indexes on those columns unless you use queries that search only on a subset of the columns. For example, rows of a table that has a PRIMARY KEY or UNIQUE constraint on three columns or simply an ordinary index on those columns can be found efficiently when values for all three columns are specified in the WHERE clause. For example, SELECT ... FROM t1 WHERE t1.c1 = 4 AND t1.c2 = 6 AND t1.c3 = 8
will use an index on t1(c1,c2,c3)
if it exists.
As a result of the improvements to multiple key indexes, the order of declared columns of the index or constraint has less affect on the speed of searches than before. If the column that contains more diverse values appears first, the searches will be slightly faster .
A multi-column index will not speed up queries on the second or third column only. The first column must be specified in the JOIN .. ON or WHERE conditions.
Query speed depends a lot on the order of the tables in the JOIN .. ON or FROM clauses. For example the second query below should be faster with large tables (provided there is an index on TB.COL3
). The reason is that TB.COL3 can be evaluated very quickly if it applies to the first table (and there is an index on TB.COL3):
(TB is a very large table with only a few rows where TB.COL3 = 4)
SELECT * FROM TA JOIN TB ON TA.COL1 = TB.COL2 AND TB.COL3 = 4;
SELECT * FROM TB JOIN TA ON TA.COL1 = TB.COL2 AND TB.COL3 = 4;
The general rule is to put first the table that has a narrowing condition on one of its columns.
1.7.2 features automatic, on-the-fly indexes for views and subselects that are used in a query. An index is added to a view when it is joined to a table or another view.
Using WHERE
conditions to join tables is likely to reduce execution
speed. For example the following query will generally be slow, even with indexes:
SELECT ... FROM TA, TB, TC WHERE TC.COL3 = TA.COL1 AND TC.COL3=TB.COL2
AND TC.COL4 = 1
The query implies TA.COL1 = TB.COL2
but does not explicitly set
this condition. If TA and TB each contain 100 rows, 10000 combinations will
be joined with TC to apply the column conditions, even though there may be indexes
on the joined columns. With the JOIN keyword, the TA.COL1 = TB.COL2
condition has to be explicit and will narrow down the combination of TA and
TB rows before they are joined with TC, resulting in much faster execution with
larger tables:
SELECT ... FROM TA JOIN TB ON TA.COL1 = TB.COL2 JOIN TC ON TB.COL2 =
TC.COL3 WHERE TC.COL4 = 1
The query can be speeded up a lot more if the order of tables in joins are
changed, so that TC.COL1 = 1
is applied first and a smaller set
of rows are joined together:
SELECT ... FROM TC JOIN TB ON TC.COL3 = TB.COL2 JOIN TA ON TC.COL3 =
TA.COL1 WHERE TC.COL4 = 1
In the above example the engine automatically applies TC.COL4 = 1
to TC and joins only the set of rows that satisfy this condition with other
tables. Indexes on TC.COL4
, TB.COL2
and TA.COL1
will be used if present and will speed up the query.
SELECT ... FROM TA WHERE TA.COL1 = (SELECT MAX(TB.COL2) FROM TB WHERE
TB.COL3 = 4)
SELECT ... FROM (SELECT MAX(TB.COL2) C1 FROM TB WHERE TB.COL3 = 4) T2
JOIN TA ON TA.COL1 = T2.C1
The second query turns MAX(TB.COL2)
into a single row table then
joins it with TA. With an index on TA.COL1
, this will be very fast.
The first query will test each row in TA and evaluate MAX(TB.COL2)
again and again.
Table columns of all types supported by HSQLDB can be indexed and can feature in comparisons. Types can be explicitly converted using the CONVERT() library function, but in most cases they are converted automatically. It is recommended not to use indexes on LONGVARBINARY, LONGVARCHAR and OTHER columns, as these indexes will probably not be allowed in future versions.
Previous versions of HSQLDB featured poor handling of arithmetic operations.
For example, it was not possible to insert 10/2.5
into any DOUBLE or DECIMAL
column. In 1.7.0, full operations are possible with the following rules:
TINYINT, SMALLINT, INTEGER, BIGINT, NUMBER and DECIMAL (without a decimal point) are supported integral types and map to byte, short, int, long and BigDecimal in Java. The SQL type dictates the maximum and minimum values that can be held in a field of each type. For example the value range for TINYINT is –128 to +127, although the actual Java type used for handling TINYINT is java.lang.Integer.
REAL, FLOAT, DOUBLE are all mapped to double in Java.
DECIMAL and NUMERIC are mapped to java.math.BigDecimal
and can
have very large numbers of digits before or after the decimal point.
TINYINT, SMALLINT, INTEGER, BIGINT, NUMBER and DECIMAL (without a decimal point)
are fully interchangeable internally, and no data narrowing takes place. Depending
on the types of the operands, the result of the operations is returned in a
JDBC ResultSet
in any of related Java types: Integer
, Long
or BigDecimal
.
The ResultSet.getXXXX()
methods can be used to retrieve the values so long as the returned value can be represented by the resulting type. This type is determinstically based on the query, not on the actual rows returned. The type does not change when the same query that returned one row, returns many rows as a result of adding more data to the tables.
If the SELECT statement refers to a simple column or function, then the return type is the type corresponding to the column or the return type of the function. For example:
CREATE TABLE t(a INTEGER, b BIGINT); SELECT MAX(a), MAX(b) FROM t;
would return a result set where the type of the first column is java.lang.Integer
and the second column is java.lang.Long
. However,
SELECT MAX(a) + 0, MAX(b) + 0 FROM t;
would return java.lang.Long
and BigDecimal
values,
generated as a result of uniform type promotion for all the return values.
There is no built-in limit on the size of intermediate integral values in expressions.
As a result, you should check for the type of the ResultSet
column and choose
an appropriate getXXXX()
method to retrieve it. Alternatively,
you can use the getObject()
method, then cast the result to java.lang.Number
and use the intValue()
or longValue()
methods on the
result.
When the result of an expression is stored in a column of a database table,
it has to fit in the target column, otherwise an error is returned. For example
when 1234567890123456789012 / 12345687901234567890
is evaluated,
the result can be stored in any integral type column, even a TINYINT column,
as it is a small value.
In SQL statements, numbers with a decimal point are treated as DECIMAL unless
they are written with an exponent. Thus 0.2
is considered a DECIMAL value but
0.2E0
is considered a DOUBLE value.
When PreparedStatement.setDouble()
or setFloat()
is used, the value is treated as a DOUBLE automatically.
When a REAL, FLOAT or DOUBLE (all synonymous) is part of an expression, the type of the result is DOUBLE.
Otherwise, when no DOUBLE value exists, if a DECIMAL or NUMBER value is part
an expression, the type of the result is DECIMAL. The result can be retrieved
from a ResultSet
in the required type so long as it can be represented.
This means DECIMAL values can be converted to DOUBLE unless they are beyond
the Double.MIN_VALUE - Double.MAX_VALUE
range. Similar to integral
values, when the result of an expression is stored in a table column, it has
to fit in the target column, otherwise an error is returned.
The distinction between DOUBLE and DECIMAL is important when a division takes
place. When the terms are DECIMAL, the result is a value with a scale (number
of digits to the right of the decimal pint) equal to the larger of the scales
of the two terms. With a DOUBLE term, the scale will reflect the actual result
of the operation. For example, 10.0/8.0
(DECIMAL) equals 1.2
but 10.0E0/8.0E0
(DOUBLE) equals 1.25
. Without division operations, DECIMAL values represent
exact arithmetic; the resulting scale is the sum of the scales of the two terms
when multiplication is performed.
REAL, FLOAT and DOUBLE values are all stored in the database as java.lang.Double objects. Special values such as NaN and +-Infinity are also stored and supported. These values can be submitted to the database via JDBC PreparedStatement methods and are returned in ResultSet objects.
In 1.7.2, BIT is simply an alias for BOOLEAN. The primary
representation of BOOLEAN column is 'true'
or 'false'
either as the boolean type or as strings when used from JDBC. This type of column can also be initialised
using values of any numeric type. In this case 0
is translated to false
and any other value such as 1 is translated to true
.
In version 1.7.2 this support has improved and any serializable JAVA Object can be inserted directly into
a column of type OTHER using any variation of PreparedStatement.setObject()
methods.
For comparison purposes and in indexes, any two Java Objects are considered equal unless one of them is NULL. You cannot search for a specific object or perform a join on a column of type OTHER.
Please note that HSQLDB is not an object-relational database. Java Objects can simply be stored internally and no operations should be performed on them other than assignment between columns of type OTHER or tests for NULL. Tests such as WHERE object1 = object2
, or WHERE object1 = ?
do not mean what you might expect, as any non-null object would satisfy such a tests. But WHERE object1 IS NOT NULL
is perfectly acceptable.
The engine does not return errors when normal column values are assigned to Java Object columns (for example assigning an INTEGER or STRING to such a column with an SQL statement such as UPDATE mytable SET objectcol = intcol WHERE ...
) but this is highly likely to be disallowed in future. So please use columns of type OTHER only to store your objects and nothing else.
Prior to 1.7.2, all table column type definitions with a column size, precision or scale qualifier were accepted and ignored. In version 1.7.2, such qualifiers must conform to the SQL standards. For example INTEGER(8) is no longer acceptable.
In 1.7.2, the qualifiers are still ignored unless you set a database property. SET PROPERTY "sql.enforce_strict_size" TRUE
will enforce sizes for CHARACTER or VARCHAR columns and pad any strings when inserting or updating a CHARACTER column. The qualifiers are still ignored for DECIMAL and DOUBLE numeric types.
Please note that casting a value to a qualified CHARACTER type will not result in truncation or padding as you might expect. So you cannot rely on a test such as CAST (mycol AS VARCHAR(2)) = 'xy'
to find the values beginning with 'xy'. Use SUBSTRING(mycol FROM 1 FOR 2)
instead.
The SEQUENCE keyword has been introduced in 1.7.2 with a subset of the SQL 200n standard sysntax. Corresponding SQL 200n syntax for IDENTITY columns has also been introduced.
Each table can contain one auto-increment column, known as the IDENTITY column. An IDENTITY column is always treated as the primary key for the table (as a result, multi-column primary keys are not possible with an IDENTITY column present). Support has been added for CREATE TABLE <tablename>(<colname>
IDENTITY, ...)
as a shortcut.
In 1.7.2 the SQL standard syntax is used by default, which allows the initial value to be specified. The supported form is(<colname> INTEGER GENERATED BY DEFAULT AS IDENTITY(START WITH n, [INCREMENT BY m])PRIMARY KEY, ...)
. Support has also been added for BIGINT
identity columns. As a result, an IDENTITY column is simply an INTEGER or BIGINT column with its default value generated by a sequence generator.
When you add a new row to such a table using
an INSERT INTO <tablename> ...;
statement, you can use the
NULL value for the IDENTITY column, which results in an auto-generated value
for the column. The IDENTITY()
function returns the last value inserted into any IDENTITY column by this connection. Use CALL IDENTITY();
as an SQL statement to retrieve this value. If you want to use the value for a field in a child table, you can use INSERT INTO <childtable> VALUES (...,IDENTITY(),...);
Both types of call to IDENTITY()
must be made before any additional update or insert statements are issued on the database.
The SQL 200n syntax and usage is different from what is supported by many existing database engines. Sequences are created with the CREATE SEQUENCE
command and their current value can be modified at any time with ALTER SEQUENCE.
The next value for a sequence is retreived with the NEXT VALUE FOR <name>
expression. This expression can be used for inserting and updating table rows. You can also use it in select statements. For example, if you want to number the returned rows of a SELECT in sequential order, you can use:
SELECT NEXT VALUE FOR mysequence, col1, col2 FROM mytable WHERE ...
Please note that the semantics of sequences is not exactly the same as defined by SQL 200n. For example if you use the same sequence twice in the same row insert query, you will get two different values, not the same value as required by the standard.
You can query the SYTEM_SEQUENCES table for the next value that will be returned from any of the defined sequences. The SEQUENCE_NAME column contains the name and the NEXT_VALUE column contains the next value to be returned.
HSQLDB supports transactions at the READ_UNCOMMITTED level, also known as level 0 transaction isolation. This means that during the lifetime of a transaction, other connections to the database can see the changes made to the data. Transaction support works well in general. Reported bugs concerning transactions being committed if the database is abruptly closed have been fixed. However, the following issues may be encountered only with multiple connections to a database using transactions:
If two transactions modify the same row, no exception is raised when both transactions are committed. This can be avoided by designing your database in such a way that application data consistency does not depend on exclusive modification of data by one transaction.
When an ALTER TABLE .. INSERT COLUMN
or DROP COLUMN
command results in changes
to the table structure, the current session is committed. If an uncommitted
transaction started by another connections has changed the data in the affected
table, it may not be possible to roll it back after the ALTER TABLE
command.
This may also apply to ADD INDEX
or ADD CONSTRAINT
commands. It is recommended
to use these ALTER
commands only when it is known that other connections are
not using transactions.
After a CHECKPOINT command is issued, uncommitted transactions can be continued, committed, or rolled back. However, if the database is not subsequently closed properly with the SHUTDOWN command, any such transaction that still remains uncommitted at the time of shutdown, is part committed (to the state at CHECKPOINT) at the next startup. It is recommended not to use the CHECKPOINT command when there are any uncommitted transactions.
In 1.7.2 many enhancements have been made for better SQL support. These are listed in hsqlSyntax.html and in changelog_1_7_2.txt. Functions and expressions such as SUBSTRING(), NULLIF(), COALESCE(), CASE ... WHEN .. ELSE etc. are among them. Other enhancements may not be very obvious in the documentation but can result in changes of behaviour from previous versions. Most significant among these are handling of NULL values in joins (null columns are no longer joined) and OUTER joins (the results are now correct). You should test your applications with the new version to ensure they do not rely on past incorrect behaviour of the engine. The engine will evolve in future versions towards full SQL standard support, so it is best not to rely on any non-standard feature of the current version.
Author: Fred Toussi - 14 July 2002 - updated 29 March 2004
Copyright 2002-2003 Fred Toussi. Permission is granted to distribute this document without any alteration under the terms of the HSQLDB license. Additional permission is granted to the HSQLDB Development Group to distribute this document with or without alterations under the terms of the HSQLDB license.