The BDB (BerkeleyDB) Storage Engine

Sleepycat Software has provided MySQL with the Berkeley DB transactional storage engine. This storage engine typically is called BDB for short. Support for the BDB storage engine is included in MySQL source distributions starting from version 3.23.34a and is activated in MySQL-Max binary distributions.

BDB tables may have a greater chance of surviving crashes and are also capable of COMMIT and ROLLBACK operations on transactions. The MySQL source distribution comes with a BDB distribution that has a couple of small patches to make it work more smoothly with MySQL. You can't use a non-patched BDB version with MySQL.

We at MySQL AB are working in close cooperation with Sleepycat to keep the quality of the MySQL/BDB interface high. (Even though Berkeley DB is in itself very tested and reliable, the MySQL interface is still considered gamma quality. We are improving and optimizing it.)

When it comes to support for any problems involving BDB tables, we are committed to helping our users locate the problem and create a reproducible test case. Any such test case will be forwarded to Sleepycat, which in turn will help us find and fix the problem. As this is a two-stage operation, any problems with BDB tables may take a little longer for us to fix than for other storage engines. However, we anticipate no significant difficulties with this procedure because the Berkeley DB code itself is used in many applications other than MySQL. See the section called “Support Offered by MySQL AB”.

For general information about Berkeley DB, please visit the Sleepycat Web site, http://www.sleepycat.com/.

Operating Systems Supported by BDB

Currently, we know that the BDB storage engine works with the following operating systems:

  • Linux 2.x Intel

  • Sun Solaris (SPARC and x86)

  • FreeBSD 4.x/5.x (x86, sparc64)

  • IBM AIX 4.3.x

  • SCO OpenServer

  • SCO UnixWare 7.1.x

BDB does not work with the following operating systems:

  • Linux 2.x Alpha

  • Linux 2.x AMD64

  • Linux 2.x IA-64

  • Linux 2.x s390

  • Mac OS X

Note: The preceding lists are not complete. We will update them as we receive more information.

If you build MySQL from source with support for BDB tables, but the following error occurs when you start mysqld, it means BDB is not supported for your architecture:

bdb: architecture lacks fast mutexes: applications cannot be threaded
Can't init databases

In this case, you must rebuild MySQL without BDB table support or start the server with the --skip-bdb option.

Installing BDB

If you have downloaded a binary version of MySQL that includes support for Berkeley DB, simply follow the usual binary distribution installation instructions. (MySQL-Max distributions include BDB support.)

If you build MySQL from source, you can enable BDB support by running configure with the --with-berkeley-db option in addition to any other options that you normally use. Download a distribution for MySQL 3.23.34 or newer, change location into its top-level directory, and run this command:

shell> ./configure --with-berkeley-db [other-options]

For more information, see the section called “Installing MySQL on Other Unix-Like Systems”, mysqld-max, and See the section called “MySQL Installation Using a Source Distribution”.

BDB Startup Options

The following options to mysqld can be used to change the behavior of the BDB storage engine:

--bdb-home=path

The base directory for BDB tables. This should be the same directory you use for --datadir.

--bdb-lock-detect=method

The BDB lock detection method. The option value should be DEFAULT, OLDEST, RANDOM, or YOUNGEST.

--bdb-logdir=path

The BDB log file directory.

--bdb-no-recover

Don't start Berkeley DB in recover mode.

--bdb-no-sync

Don't synchronously flush the BDB logs.

--bdb-shared-data

Start Berkeley DB in multi-process mode. (Don't use DB_PRIVATE when initializing Berkeley DB.)

--bdb-tmpdir=path

The BDB temporary file directory.

--skip-bdb

Disable the BDB storage engine.

See the section called “mysqld Command-Line Options”.

The following system variable affects the behavior of BDB tables:

bdb_max_lock

The maximum number of locks you can have active on a BDB table.

See the section called “Server System Variables”.

If you use the --skip-bdb option, MySQL will not initialize the Berkeley DB library and this will save a lot of memory. However, if you use this option, you cannot use BDB tables. If you try to create a BDB table, MySQL will create a MyISAM table instead.

Normally, you should start mysqld without the --bdb-no-recover option if you intend to use BDB tables. However, this may give you problems when you try to start mysqld if the BDB log files are corrupted. See the section called “Starting and Troubleshooting the MySQL Server”.

With the bdb_max_lock variable, you can specify the maximum number of locks that can be active on a BDB table. The default is 10,000. You should increase this if errors such as the following occur when you perform long transactions or when mysqld has to examine many rows to execute a query:

bdb: Lock table is out of available locks
Got error 12 from ...

You may also want to change the binlog_cache_size and max_binlog_cache_size variables if you are using large multiple-statement transactions. See the section called “The Binary Log”.

Characteristics of BDB Tables

Each BDB table is stored on disk in two files. The files have names that begin with the table name and have an extension to indicate the file type. An .frm file stores the table definition, and a .db file contains the table data and indexes.

To specify explicitly that you want a BDB table, indicate that with an ENGINE or TYPE table option:

CREATE TABLE t (i INT) ENGINE = BDB;
CREATE TABLE t (i INT) TYPE = BDB;

BerkeleyDB is a synonym for BDB in the ENGINE or TYPE option.

The BDB storage engine provides transactional tables. The way you use these tables depends on the autocommit mode:

  • If you are running with autocommit enabled (which is the default), changes to BDB tables are committed immediately and cannot be rolled back.

  • If you are running with autocommit disabled, changes do not become permanent until you execute a COMMIT statement. Instead of committing, you can execute ROLLBACK to forget the changes.

    You can start a transaction with the BEGIN WORK statement to suspend autocommit, or with SET AUTOCOMMIT=0 to disable autocommit explicitly.

See COMMIT.

The BDB storage engine has the following characteristics:

  • BDB tables can have up to 31 indexes per table, 16 columns per index, and a maximum key size of 1024 bytes (500 bytes before MySQL 4.0).

  • MySQL requires a PRIMARY KEY in each BDB table so that each row can be uniquely identified. If you don't create one explicitly, MySQL creates and maintains a hidden PRIMARY KEY for you. The hidden key has a length of five bytes and is incremented for each insert attempt.

  • The PRIMARY KEY will be faster than any other index, because the PRIMARY KEY is stored together with the row data. The other indexes are stored as the key data + the PRIMARY KEY, so it's important to keep the PRIMARY KEY as short as possible to save disk space and get better speed.

    This behavior is similar to that of InnoDB, where shorter primary keys save space not only in the primary index but in secondary indexes as well.

  • If all columns you access in a BDB table are part of the same index or part of the primary key, MySQL can execute the query without having to access the actual row. In a MyISAM table, this can be done only if the columns are part of the same index.

  • Sequential scanning is slower than for MyISAM tables because the data in BDB tables is stored in B-trees and not in a separate data file.

  • Key values are not prefix- or suffix-compressed like key values in MyISAM tables. In other words, key information takes a little more space in BDB tables compared to MyISAM tables.

  • There are often holes in the BDB table to allow you to insert new rows in the middle of the index tree. This makes BDB tables somewhat larger than MyISAM tables.

  • SELECT COUNT(*) FROM tbl_name is slow for BDB tables, because no row count is maintained in the table.

  • The optimizer needs to know the approximate number of rows in the table. MySQL solves this by counting inserts and maintaining this in a separate segment in each BDB table. If you don't issue a lot of DELETE or ROLLBACK statements, this number should be accurate enough for the MySQL optimizer. However, MySQL stores the number only on close, so it may be incorrect if the server terminates unexpectedly. It should not be fatal even if this number is not 100% correct. You can update the row count by using ANALYZE TABLE or OPTIMIZE TABLE. See ANALYZE TABLE and OPTIMIZE TABLE.

  • Internal locking in BDB tables is done at the page level.

  • LOCK TABLES works on BDB tables as with other tables. If you don't use LOCK TABLE, MySQL issues an internal multiple-write lock on the table (a lock that doesn't block other writers) to ensure that the table will be properly locked if another thread issues a table lock.

  • To be able to roll back transactions, the BDB storage engine maintains log files. For maximum performance, you can use the --bdb-logdir option to place the BDB logs on a different disk than the one where your databases are located.

  • MySQL performs a checkpoint each time a new BDB log file is started, and removes any BDB log files that are not needed for current transactions. You can also use FLUSH LOGS at any time to checkpoint the Berkeley DB tables.

    For disaster recovery, you should use table backups plus MySQL's binary log. See the section called “Database Backups”.

    Warning: If you delete old log files that are still in use, BDB will not be able to do recovery at all and you may lose data if something goes wrong.

  • Applications must always be prepared to handle cases where any change of a BDB table may cause an automatic rollback and any read may fail with a deadlock error.

  • If you get full disk with a BDB table, you will get an error (probably error 28) and the transaction should roll back. This contrasts with MyISAM and ISAM tables, for which mysqld will wait for enough free disk before continuing.

Things We Need to Fix for BDB

  • It's very slow to open many BDB tables at the same time. If you are going to use BDB tables, you should not have a very large table cache (for example, with a size larger than 256) and you should use the --no-auto-rehash option when you use the mysql client. We plan to partly fix this in 4.0.

  • SHOW TABLE STATUS doesn't yet provide very much information for BDB tables.

  • Optimize performance.

  • Change to not use page locks at all for table scanning operations.

Restrictions on BDB Tables

The following list indicates restrictions that you must observe when using BDB tables:

  • Each BDB table stores in the .db file the path to the file as it was created. This was done to be able to detect locks in a multi-user environment that supports symlinks. However, the consequence is that BDB table files cannot be moved from one database directory to another.

  • When making backups of BDB tables, you must either use mysqldump or else make a backup that includes the files for each BDB table (the .frm and .db files) as well as the BDB log files. The BDB storage engine stores unfinished transactions in its log files and requires them to be present when mysqld starts. The BDB logs are the files in the data directory with names of the form log.XXXXXXXXXX (ten digits).

  • If a column that allows NULL values has a unique index, only a single NULL value is allowed. This differs from other storage engines.

Errors That May Occur When Using BDB Tables

  • If the following error occurs when you start mysqld, it means that the new BDB version doesn't support the old log file format:

    bdb:  Ignoring log file: .../log.XXXXXXXXXX:
    unsupported log version #
    

    In this case, you must delete all BDB logs from your data directory (the files with names that have the format log.XXXXXXXXXX) and restart mysqld. We also recommend that you then use mysqldump --opt to dump your BDB tables, drop the tables, and restore them from the dump file.

  • If autocommit mode is disabled and you drop a BDB table that is referenced in another transaction, you may get error messages of the following form in your MySQL error log:

    001119 23:43:56  bdb:  Missing log fileid entry
    001119 23:43:56  bdb:  txn_abort: Log undo failed for LSN:
                           1 3644744: Invalid
    

    This is not fatal, but until the problem is fixed, we recommend that you not drop BDB tables except while autocommit mode is enabled. (The fix is not trivial.)