InnoDB uses simulated asynchronous disk I/O: InnoDB creates a number of threads to take care of I/O operations, such as read-ahead.
There are two read-ahead heuristics in InnoDB:
In sequential read-ahead, if InnoDB notices that the access pattern to a segment in the tablespace is sequential, it posts in advance a batch of reads of database pages to the I/O system.
In random read-ahead, if InnoDB notices that some area in a tablespace seems to be in the process of being fully read into the buffer pool, it posts the remaining reads to the I/O system.
Starting from MySQL 3.23.40b, InnoDB uses a novel file flush technique called doublewrite. It adds safety to crash recovery after an operating system crash or a power outage, and improves performance on most Unix flavors by reducing the need for fsync() operations.
Doublewrite means that before writing pages to a data file, InnoDB first writes them to a contiguous tablespace area called the doublewrite buffer. Only after the write and the flush to the doublewrite buffer has completed does InnoDB write the pages to their proper positions in the data file. If the operating system crashes in the middle of a page write, InnoDB later will find a good copy of the page from the doublewrite buffer during recovery.
Starting from MySQL 3.23.41, you can use raw disk partitions as tablespace data files. By using a raw disk, you can perform non-buffered I/O on Windows and on some Unix systems without filesystem overhead, which might improve performance.
When you create a new data file, you must put the keyword newraw immediately after the data file size in innodb_data_file_path. The partition must be at least as large as the size that you specify. Note that 1MB in InnoDB is 1024 * 1024 bytes, whereas 1MB usually means 1,000,000 bytes in disk specifications.
[mysqld] innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:3Gnewraw;/dev/hdd2:2Gnewraw
The next time you start the server, InnoDB notices the newraw keyword and initializes the new partition. However, do not create or change any InnoDB tables yet. Otherwise, when you next restart the server, InnoDB will reinitialize the partition and your changes will be lost. (Starting from 3.23.44, as a safety measure InnoDB prevents users from modifying data when any partition with newraw is specified.)
After InnoDB has initialized the new partition, stop the server, change newraw in the data file specification to raw:
[mysqld] innodb_data_home_dir= innodb_data_file_path=/dev/hdd1:5Graw;/dev/hdd2:2Graw
Then restart the server and InnoDB will allow changes to be made.
On Windows, starting from 4.1.1, you can allocate a disk partition as a data file like this:
[mysqld] innodb_data_home_dir= innodb_data_file_path=//./D::10Gnewraw
The //./ corresponds to the Windows syntax of \\.\ for accessing physical drives.
When you use raw disk partitions, be sure that they have permissions that allow read and write access by the account used for running the MySQL server.
The data files you define in the configuration file form the tablespace of InnoDB. The files are simply concatenated to form the tablespace. There is no striping in use. Currently you cannot define where in the tablespace your tables will be allocated. However, in a newly created tablespace, InnoDB allocates space starting from the first data file.
The tablespace consists of database pages with a default size of 16KB. The pages are grouped into extents of 64 consecutive pages. The “files” inside a tablespace are called segments in InnoDB. The name of the “rollback segment” is somewhat confusing because it actually contains many segments in the tablespace.
Two segments are allocated for each index in InnoDB. One is for non-leaf nodes of the B-tree, the other is for the leaf nodes. The idea here is to achieve better sequentiality for the leaf nodes, which contain the data.
When a segment grows inside the tablespace, InnoDB allocates the first 32 pages to it individually. After that InnoDB starts to allocate whole extents to the segment. InnoDB can add to a large segment up to 4 extents at a time to ensure good sequentiality of data.
Some pages in the tablespace contain bitmaps of other pages, and therefore a few extents in an InnoDB tablespace cannot be allocated to segments as a whole, but only as individual pages.
When you ask for available free space in the tablespace by issuing a SHOW TABLE STATUS, InnoDB reports the extents that are definitely free in the tablespace. InnoDB always reserves some extents for clean-up and other internal purposes; these reserved extents are not included in the free space.
When you delete data from a table, InnoDB will contract the corresponding B-tree indexes. It depends on the pattern of deletes whether that frees individual pages or extents to the tablespace, so that the freed space becomes available for other users. Dropping a table or deleting all rows from it is guaranteed to release the space to other users, but remember that deleted rows will be physically removed only in an (automatic) purge operation after they are no longer needed in transaction rollback or consistent read.
If there are random insertions into or deletions from the indexes of a table, the indexes may become fragmented. Fragmentation means that the physical ordering of the index pages on the disk is not close to the index ordering of the records on the pages, or that there are many unused pages in the 64-page blocks that were allocated to the index.
A symptom of fragmentation is that a table takes more space than it 'should take'. How much exactly is that, is difficult to determine. All InnoDB data and indexes are stored in B-trees, and their fillfactor may vary 50 % - 100 %. Another symptom of fragmentation is that a table scan:
SELECT COUNT(*) FROM t WHERE a_non_indexed_column <> 12345;
takes more time than 'it should take'. (Above we are fooling the SQL optimizer to scan the clustered index, not a secondary index.) Most disks can read 10 - 50 MB/s. That can be used to estimate how fast a table scan should run.
It can speed up index scans if you periodically perform a “null” ALTER TABLE operation:
ALTER TABLE tbl_name TYPE=InnoDB
That causes MySQL to rebuild the table. Another way to perform a defragmention operation is to use mysqldump to dump the table to a text file, drop the table, and reload it from the dump file.
If the insertions to an index are always ascending and records are deleted only from the end, the InnoDB file space management algorithm guarantees that fragmentation in the index will not occur.