InnoDB Transaction Model and Locking

In the InnoDB transaction model, the goal has been to combine the best properties of a multi-versioning database with traditional two-phase locking. InnoDB does locking on the row level and runs queries as non-locking consistent reads by default, in the style of Oracle. The lock table in InnoDB is stored so space-efficiently that lock escalation is not needed: Typically several users are allowed to lock every row in the database, or any random subset of the rows, without InnoDB running out of memory.

InnoDB and AUTOCOMMIT

In InnoDB, all user activity occurs inside a transaction. If the autocommit mode is enabled, each SQL statement forms a single transaction on its own. MySQL always starts a new connection with autocommit enabled.

If the autocommit mode is switched off with SET AUTOCOMMIT = 0, then we can consider that a user always has a transaction open. An SQL COMMIT or ROLLBACK statement ends the current transaction and a new one starts. Both statements will release all InnoDB locks that were set during the current transaction. A COMMIT means that the changes made in the current transaction are made permanent and become visible to other users. A ROLLBACK statement, on the other hand, cancels all modifications made by the current transaction.

If the connection has autocommit enabled, the user can still perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with COMMIT or ROLLBACK.

InnoDB and TRANSACTION ISOLATION LEVEL

In terms of the SQL:1992 transaction isolation levels, the InnoDB default is REPEATABLE READ. Starting from MySQL 4.0.5, InnoDB offers all four different transaction isolation levels described by the SQL standard. You can set the default isolation level for all connections by using the --transaction-isolation option on the command line or in option files. For example, you can set the option in the [mysqld] section of my.cnf like this:

[mysqld]
transaction-isolation = {READ-UNCOMMITTED | READ-COMMITTED
                         | REPEATABLE-READ | SERIALIZABLE}

A user can change the isolation level of a single session or all new incoming connections with the SET TRANSACTION statement. Its syntax is as follows:

SET [SESSION | GLOBAL] TRANSACTION ISOLATION LEVEL
                       {READ UNCOMMITTED | READ COMMITTED
                        | REPEATABLE READ | SERIALIZABLE}

Note that there are hyphens in the level names for the --transaction-isolation option, but not for the SET TRANSACTION statement.

The default behavior is to set the isolation level for the next (not started) transaction. If you use the GLOBAL keyword, the statement sets the default transaction level globally for all new connections created from that point on (but not existing connections). You need the SUPER privilege to do this. Using the SESSION keyword sets the default transaction level for all future transactions performed on the current connection.

Any client is free to change the session isolation level (even in the middle of a transaction), or the isolation level for the next transaction.

Before MySQL 3.23.50, SET TRANSACTION had no effect on InnoDB tables. Before 4.0.5, only REPEATABLE READ and SERIALIZABLE were available.

You can query the global and session transaction isolation levels with these statements:

SELECT @@global.tx_isolation;
SELECT @@tx_isolation;

In row-level locking, InnoDB uses so-called “next-key locking.” That means that besides index records, InnoDB can also lock the “gap” before an index record to block insertions by other users immediately before the index record. A next-key lock refers to a lock that locks an index record and the gap before it. A gap lock refers to a lock that only locks a gap before some index record.

A detailed description of each isolation level in InnoDB:

  • READ UNCOMMITTED

    SELECT statements are performed in a non-locking fashion, but a possible earlier version of a record might be used. Thus, using this isolation level, such reads are not “consistent.” This is also called “dirty read.” Other than that, this isolation level works like READ COMMITTED.

  • READ COMMITTED

    A somewhat Oracle-like isolation level. All SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE statements lock only the index records, not the gaps before them, and thus allow free inserting of new records next to locked records. UPDATE and DELETE statements that use a unique index with a unique search condition lock only the index record found, not the gap before it. In range-type UPDATE and DELETE statements, InnoDB must set next-key or gap locks and block insertions by other users to the gaps covered by the range. This is necessary because “phantom rows” must be blocked for MySQL replication and recovery to work.

    Consistent reads behave as in Oracle: Each consistent read, even within the same transaction, sets and reads its own fresh snapshot. See the section called “Consistent Non-Locking Read”.

  • REPEATABLE READ

    This is the default isolation level of InnoDB. SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE, UPDATE, and DELETE statements that use a unique index with a unique search condition lock only the index record found, not the gap before it. With other search conditions, these operations employ next-key locking, locking the index range scanned with next-key or gap locks, and block new insertions by other users.

    In consistent reads, there is an important difference from the previous isolation level: In this level, all consistent reads within the same transaction read the same snapshot established by the first read. This convention means that if you issue several plain SELECT statements within the same transaction, these SELECT statements are consistent also with respect to each other. See the section called “Consistent Non-Locking Read”.

  • SERIALIZABLE

    This level is like REPEATABLE READ, but all plain SELECT statements are implicitly converted to SELECT ... LOCK IN SHARE MODE.

Consistent Non-Locking Read

A consistent read means that InnoDB uses its multi-versioning to present to a query a snapshot of the database at a point in time. The query will see the changes made by exactly those transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query will see the changes made by the transaction itself that issues the query.

If you are running with the default REPEATABLE READ isolation level, then all consistent reads within the same transaction read the snapshot established by the first such read in that transaction. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries.

Consistent read is the default mode in which InnoDB processes SELECT statements in READ COMMITTED and REPEATABLE READ isolation levels. A consistent read does not set any locks on the tables it accesses, and therefore other users are free to modify those tables at the same time a consistent read is being performed on the table.

Locking Reads SELECT ... FOR UPDATE and SELECT ... LOCK IN SHARE MODE

In some circumstances, a consistent read is not convenient. For example, you might want to add a new row into your table child, and make sure that the child already has a parent in table parent. The following example shows how to implement referential integrity in your application code.

Suppose that you use a consistent read to read the table parent and indeed see the parent of the child in the table. Can you now safely add the child row to table child? No, because it may happen that meanwhile some other user deletes the parent row from the table parent, without you being aware of it.

The solution is to perform the SELECT in a locking mode using LOCK IN SHARE MODE:

SELECT * FROM parent WHERE NAME = 'Jones' LOCK IN SHARE MODE;

Performing a read in share mode means that we read the latest available data, and set a shared mode lock on the rows we read. A shared mode lock prevents others from updating or deleting the row we have read. Also, if the latest data belongs to a yet uncommitted transaction of another client connection, we will wait until that transaction commits. After we see that the preceding query returns the parent 'Jones', we can safely add the child record to the child table and commit our transaction.

Let us look at another example: We have an integer counter field in a table child_codes that we use to assign a unique identifier to each child added to table child. Obviously, using a consistent read or a shared mode read to read the present value of the counter is not a good idea, since two users of the database may then see the same value for the counter, and a duplicate-key error will occur if two users attempt to add children with the same identifier to the table.

Here, LOCK IN SHARE MODE is not a good solution because if two users read the counter at the same time, at least one of them will end up in deadlock when attempting to update the counter.

In this case, there are two good ways to implement the reading and incrementing of the counter: (1) update the counter first by incrementing it by 1 and only after that read it, or (2) read the counter first with a lock mode FOR UPDATE, and increment after that. The latter approach can be implemented as follows:

SELECT counter_field FROM child_codes FOR UPDATE;
UPDATE child_codes SET counter_field = counter_field + 1;

A SELECT ... FOR UPDATE reads the latest available data, setting exclusive locks on each row it reads. Thus it sets the same locks a searched SQL UPDATE would set on the rows.

Please note that the above is merely an example of how SELECT ... FOR UPDATE works. In MySQL, the specific task of generating a unique identifier actually can be accomplished using only a single access to the table:

UPDATE child_codes SET counter_field = LAST_INSERT_ID(counter_field + 1);
SELECT LAST_INSERT_ID();

The SELECT statement merely retrieves the identifier information (specific to the current connection). It does not access any table.

Next-Key Locking: Avoiding the Phantom Problem

In row-level locking, InnoDB uses an algorithm called “next-key locking.” InnoDB does the row-level locking in such a way that when it searches or scans an index of a table, it sets shared or exclusive locks on the index records it encounters. Thus the row-level locks are actually index record locks.

The locks InnoDB sets on index records also affect the “gap” before that index record. If a user has a shared or exclusive lock on record R in an index, another user cannot insert a new index record immediately before R in the index order. This locking of gaps is done to prevent the so-called “phantom problem.” Suppose that you want to read and lock all children from the child table with an identifier value larger than 100, with the intent of updating some column in the selected rows later:

SELECT * FROM child WHERE id > 100 FOR UPDATE;

Suppose that there is an index on the id column. The query will scan that index starting from the first record where id is bigger than 100. Now, if the locks set on the index records would not lock out inserts made in the gaps, a new row might meanwhile be inserted to the table. If you now execute the same SELECT within the same transaction, you would see a new row in the result set returned by the query. This is contrary the isolation principle of transactions: A transaction should be able to run so that the data it has read does not change during the transaction. If we regard a set of rows as a data item, the new “phantom” child would violate this isolation principle.

When InnoDB scans an index, it can also lock the gap after the last record in the index. Just that happens in the previous example: The locks set by InnoDB prevent any insert to the table where id would be bigger than 100.

You can use next-key locking to implement a uniqueness check in your application: If you read your data in share mode and do not see a duplicate for a row you are going to insert, then you can safely insert your row and know that the next-key lock set on the successor of your row during the read will prevent anyone meanwhile inserting a duplicate for your row. Thus the next-key locking allows you to “lock” the non-existence of something in your table.

An Example of How the Consistent Read Works in InnoDB

Suppose that you are running in the default REPEATABLE READ isolation level. When you issue a consistent read, that is, an ordinary SELECT statement, InnoDB will give your transaction a timepoint according to which your query sees the database. If another transaction deletes a row and commits after your timepoint was assigned, you will not see the row as having been deleted. Inserts and updates are treated similarly.

You can advance your timepoint by committing your transaction and then doing another SELECT.

This is called “multi-versioned concurrency control.”

               User A                 User B

           SET AUTOCOMMIT=0;      SET AUTOCOMMIT=0;
time
|          SELECT * FROM t;
|          empty set
|                                 INSERT INTO t VALUES (1, 2);
|
v          SELECT * FROM t;
           empty set
                                  COMMIT;

           SELECT * FROM t;
           empty set

           COMMIT;

           SELECT * FROM t;
           ---------------------
           |    1    |    2    |
           ---------------------
           1 row in set

In this example, user A sees the row inserted by B only when B has committed the insert and A has committed as well, so that the timepoint is advanced past the commit of B.

If you want to see the “freshest” state of the database, you should use either the READ COMMITTED isolation level or a locking read:

SELECT * FROM t LOCK IN SHARE MODE;

Locks Set by Different SQL Statements in InnoDB

A locking read, an UPDATE, or a DELETE generally set record locks on every index record that is scanned in the processing of the SQL query. It does not matter if there are WHERE conditions in the query that would exclude the row from the result set of the query. InnoDB does not remember the exact WHERE condition, but only knows which index ranges were scanned. The record locks are normally next-key locks that also block inserts to the “gap” immediately before the record.

If the locks to be set are exclusive, then InnoDB always retrieves also the clustered index record and sets a lock on it.

If you do not have indexes suitable for your query and MySQL has to scan the whole table to process the query, every row of the table will become locked, which in turn blocks all inserts by other users to the table. It is important to create good indexes so that your queries do not unnecessarily need to scan many rows.

  • SELECT ... FROM is a consistent read, reading a snapshot of the database and setting no locks unless the transaction isolation level is set to SERIALIZABLE. For SERIALIZABLE level, this sets shared next-key locks on the index records it encounters.

  • SELECT ... FROM ... LOCK IN SHARE MODE sets shared next-key locks on all index records the read encounters.

  • SELECT ... FROM ... FOR UPDATE sets exclusive next-key locks on all index records the read encounters.

  • INSERT INTO ... VALUES (...) sets an exclusive lock on the inserted row. Note that this lock is not a next-key lock and does not prevent other users from inserting to the gap before the inserted row. If a duplicate-key error occurs, a shared lock on the duplicate index record is set.

  • While initializing a previously specified AUTO_INCREMENT column on a table, InnoDB sets an exclusive lock on the end of the index associated with the AUTO_INCREMENT column. In accessing the auto-increment counter, InnoDB uses a specific table lock mode AUTO-INC where the lock lasts only to the end of the current SQL statement, instead of to the end of the whole transaction. See InnoDB and AUTOCOMMIT.

    Before MySQL 3.23.50, SHOW TABLE STATUS applied to a table with an AUTO_INCREMENT column sets an exclusive row-level lock to the high end of the AUTO_INCREMENT index. This means also that SHOW TABLE STATUS could cause a deadlock of transactions, something that may surprise users. Starting from MySQL 3.23.50, InnoDB fetches the value of a previously initialized AUTO_INCREMENT column without setting any locks.

  • INSERT INTO T SELECT ... FROM S WHERE ... sets an exclusive (non-next-key) lock on each row inserted into T. It does the search on S as a consistent read, but sets shared next-key locks on S if MySQL binary logging is turned on. InnoDB has to set locks in the latter case: In roll-forward recovery from a backup, every SQL statement has to be executed in exactly the same way it was done originally.

  • CREATE TABLE ... SELECT ... performs the SELECT as a consistent read or with shared locks, as in the previous item.

  • REPLACE is done like an insert if there is no collision on a unique key. Otherwise, an exclusive next-key lock is placed on the row that has to be updated.

  • UPDATE ... WHERE ... sets an exclusive next-key lock on every record the search encounters.

  • DELETE FROM ... WHERE ... sets an exclusive next-key lock on every record the search encounters.

  • If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete that requires checking of the constraint condition sets shared record-level locks on the records it looks at to check the constraint. InnoDB also sets these locks in the case where the constraint fails.

  • LOCK TABLES sets table locks, but it is the higher MySQL layer above the InnoDB layer that sets these locks. Beginning with MySQL 4.0.20 and 4.1.2, InnoDB is aware of table locks if innodb_table_locks=1, and the MySQL layer above InnoDB knows about row-level locks. Before that, the automatic deadlock detection of InnoDB cannot detect deadlocks where such table locks are involved. Also, since the higher MySQL layer does not know about row-level locks, it is possible to get a table lock on a table where another user currently has row-level locks. But that does not put transaction integrity in danger. See the section called “Deadlock Detection and Rollback”. See the section called “Restrictions on InnoDB Tables”.

When Does MySQL Implicitly Commit or Roll Back a Transaction?

MySQL begins each client connection with autocommit mode enabled by default. When autocommit is enabled, MySQL does a commit after each SQL statement if that statement did not return an error.

If you have the autocommit mode off and close a connection without calling an explicit commit of your transaction, then MySQL will roll back your transaction.

If a SQL statement returns an error, the commit/rollback behavior depends on the error. See the section called “Error Handling”.

The following SQL statements (and any synonyms for them) cause an implicit commit of the current transaction in MySQL:

  • ALTER TABLE, BEGIN, CREATE INDEX, DROP DATABASE, DROP INDEX, DROP TABLE, LOAD MASTER DATA, LOCK TABLES, RENAME TABLE, SET AUTOCOMMIT=1, START TRANSACTION, TRUNCATE, UNLOCK TABLES.

  • CREATE TABLE (this commits only if before MySQL 4.0.13 and MySQL binary logging is used).

  • The CREATE TABLE statement in InnoDB is processed as a single transaction. This means that a ROLLBACK from the user does not undo CREATE TABLE statements the user made during that transaction.

Deadlock Detection and Rollback

InnoDB automatically detects a deadlock of transactions and rolls back a transaction or transactions to prevent the deadlock. Starting from MySQL 4.0.5, InnoDB tries to pick small transactions to roll back. The size of a transaction is determined by the number of rows it has inserted, updated, or deleted. Prior to 4.0.5, InnoDB always rolled back the transaction whose lock request was the last one to build a deadlock, that is, a cycle in the “waits-for” graph of transactions.

Beginning with MySQL 4.0.20 and 4.1.2, InnoDB is aware of table locks if innodb_table_locks=1, and the MySQL layer above InnoDB knows about row-level locks. Before that, InnoDB cannot detect deadlocks where a table lock set by a MySQL LOCK TABLES statement is involved, or if a lock set by another storage engine than InnoDB is involved. You have to resolve these situations by setting the value of the innodb_lock_wait_timeout system variable.

When InnoDB performs a complete rollback of a transaction, all the locks of the transaction are released. However, if just a single SQL statement is rolled back as a result of an error, some of the locks set by the SQL statement may be preserved. This is because InnoDB stores row locks in a format such it cannot know afterward which lock was set by which SQL statement.

How to Cope with Deadlocks

Deadlocks are a classic problem in transactional databases, but they are not dangerous unless they are so frequent that you cannot run certain transactions at all. Normally, you must write your applications so that they are always prepared to re-issue a transaction if it gets rolled back because of a deadlock.

InnoDB uses automatic row-level locking. You can get deadlocks even in the case of transactions that just insert or delete a single row. That is because these operations are not really “atomic”; they automatically set locks on the (possibly several) index records of the row inserted or deleted.

You can cope with deadlocks and reduce the likelihood of their occurrence with the following techniques:

  • Use SHOW INNODB STATUS to determine the cause of the latest deadlock. That can help you to tune your application to avoid deadlocks. This strategy can be used as of MySQL 3.23.52 and 4.0.3, depending on your MySQL series.

  • Always be prepared to re-issue a transaction if it fails due to deadlock. Deadlocks are not dangerous. Just try again.

  • Commit your transactions often. Small transactions are less prone to collide.

  • If you are using locking reads (SELECT ... FOR UPDATE or ... LOCK IN SHARE MODE), try using a lower isolation level such as READ COMMITTED.

  • Access your tables and rows in a fixed order. Then transactions form nice queues and do not deadlock.

  • Add well-chosen indexes to your tables. Then your queries need to scan fewer index records and consequently set fewer locks. Use EXPLAIN SELECT to determine which indexes the MySQL server regards as the most appropriate for your queries.

  • Use less locking. If you can afford to allow a SELECT to return data from an old snapshot, do not add the clause FOR UPDATE or LOCK IN SHARE MODE to it. Using READ COMMITTED isolation level is good here, because each consistent read within the same transaction reads from its own fresh snapshot.

  • If nothing helps, serialize your transactions with table-level locks. For example, if you need to write table t1 and read table t2, you can do this:

    LOCK TABLES t1 WRITE, t2 READ, ...;
    [do something with tables t1 and t2 here];
    UNLOCK TABLES;
    

    Table-level locks make your transactions queue nicely, and deadlocks are avoided. Note that LOCK TABLES implicitly starts a transaction, just like the statement BEGIN, and UNLOCK TABLES implicitly ends the transaction in a COMMIT.

  • Another way to serialize transactions is to create an auxiliary “semaphore” table that contains just a single row. Have each transaction update that row before accessing other tables. In that way, all transactions happen in a serial fashion. Note that the InnoDB instant deadlock detection algorithm also works in this case, because the serializing lock is a row-level lock. With MySQL table-level locks, the timeout method must be used to resolve deadlocks.