Managing a MySQL Cluster involves a number of activities. The first activity is to configure and startup MySQL Cluster. This is covered by the sections the section called “MySQL Cluster Configuration” and the section called “Process Management in MySQL Cluster”. This section covers how to manage a running MySQL Cluster.
There are essentially two ways of actively managing a running MySQL Cluster. The first is by commands entered into the management client where status of cluster can be checked, log levels changed, backups started and stopped and nodes can be stopped and started. The second method is to study the output in the cluster log. The cluster log is directed towards the ndb_2_cluster.log in the DataDir directory of the management server. The cluster log contains event reports generated from the ndbd processes in the cluster. It is also possible to send the cluster log entries to a Unix system log.
In addition to the central configuration file, the cluster may also be controlled through a command line interface. The command line interface is available through a separate management client process. This is the main administrative interface to a running cluster.
The management client has the following basic commands. Below, <id> denotes either a database node id (e.g. 21) or the keyword ALL that indicates that the command should be applied to all database nodes in the cluster.
Prints information on all available commands.
Prints information on the status of the cluster.
Start a database node identified with <id> or all database nodes.
Stop a database node identified with <id> or all database nodes.
Restart a database node identified with <id> or all database nodes.
Displays status information for database node identified with <id> (or ALL database nodes).
Enters single user mode where only the API with node <id> is allowed to access the database system.
Exists single user mode allowing all APIs to access the database system.
Terminates the management client.
Shuts down all cluster nodes (except mysql servers) and exits.
Commands for the event logs are given in the next section and commands for backup and restore are given in a separate section on these topics.
MySQL Cluster has two event logs, the cluster log and the node log.
The cluster log is a log of the whole cluster and this log can have multiple destinations (file, management server console window or syslog).
The node log is a log that is local to each database node and is written to the console window of the database node. The two logs can be set to log different subsets of the list of events.
Note: The cluster log is the recommended log. The node log is only intended to be used during application development or for debugging application code.
Each reportable event has the following properties:
Category (STARTUP, SHUTDOWN, STATISTICS, CHECKPOINT, NODERESTART, CONNECTION, ERROR, INFO)
Priority (1-15 where 1 - Most important, 15 - Least important)
Severity (ALERT, CRITICAL, ERROR, WARNING, INFO, DEBUG)
The two logs (the cluster log and the node log) can be filtered on these properties.
The following management commands are related to the cluster log:
Turn cluster log on.
Turn cluster log off.
Information about cluster log settings.
Log category events with priority less than or equal to threshold in the cluster log.
Toggles cluster logging of the specified severity type on/off.
The following table describes the default setting (for all database nodes) of the cluster log category threshold. If an event has a priority with a value lower than or equal to the priority threshold, then it is reported in the cluster log.
Note that the events are reported per database node and that the thresholds can be set differently on different nodes.
Category | Default threshold (All database nodes) |
STARTUP | 7 |
SHUTDOWN | 7 |
STATISTICS | 7 |
CHECKPOINT | 7 |
NODERESTART | 7 |
CONNECTION | 7 |
ERROR | 15 |
INFO | 7 |
The threshold is used to filter events within each category. For example: a STARTUP event with a priority of 3 is never sent unless the threshold for STARTUP is changed to 3 or lower. Only events with priority 3 or lower are sent if the threshold is 3. The event severities are (corresponds to UNIX syslog levels):
1 | ALERT | A condition that should be corrected immediately, such as a corrupted system database |
2 | CRITICAL | Critical conditions, such as device errors or out of resources |
3 | ERROR | Conditions that should be corrected, such as configuration errors |
4 | WARNING | Conditions that are not error conditions but might require handling |
5 | INFO | Informational messages |
6 | DEBUG | Messages used during development of NDB Cluster |
Syslog's LOG_EMERG and LOG_NOTICE are not used/mapped.
The event severities can be turned on or off. If the severity is on then all events with priority less than or equal to the category thresholds are logged. If the severity is off then no events belonging to the severity are logged.
The following commands are related to the node log:
<id> LOGLEVEL <levelnumber> Set logging level for database process with id to the value of <levelnumber>.
All reportable events are listed below.
Event | Category | Priority | Severity | Description |
DB nodes connected | CONNECTION | 8 | INFO | |
DB nodes disconnected | CONNECTION | 8 | INFO | |
Communication closed | CONNECTION | 8 | INFO | API & DB nodes connection closed |
Communication opened | CONNECTION | 8 | INFO | API & DB nodes connection opened |
Global checkpoint started | CHECKPOINT | 9 | INFO | Start of a GCP, i.e., REDO log is written to disk |
Global checkpoint completed | CHECKPOINT | 10 | INFO | GCP finished |
Local checkpoint started | CHECKPOINT | 7 | INFO | Start of local check pointing, i.e., data is written to disk. LCP Id and GCI Ids (keep and oldest restorable) |
Local checkpoint completed | CHECKPOINT | 8 | INFO | LCP finished |
LCP stopped in calc keep GCI | CHECKPOINT | 0 | ALERT | LCP stopped! |
Local checkpoint fragment completed | CHECKPOINT | 11 | INFO | A LCP on a fragment has been completed |
Report undo log blocked | CHECKPOINT | 7 | INFO | Reports undo logging blocked due buffer near to overflow |
DB node start phases initiated | STARTUP | 1 | INFO | NDB Cluster starting |
DB node all start phases completed | STARTUP | 1 | INFO | NDB Cluster started |
Internal start signal received STTORRY | STARTUP | 15 | INFO | Internal start signal to blocks received after restart finished |
DB node start phase X completed | STARTUP | 4 | INFO | A start phase has completed |
Node has been successfully included into the cluster | STARTUP | 3 | INFO | President node, own node and dynamic id is shown |
Node has been refused to be included into the cluster | STARTUP | 8 | INFO | |
DB node neighbours | STARTUP | 8 | INFO | Show left and right DB nodes neighbours |
DB node shutdown initiated | STARTUP | 1 | INFO | |
DB node shutdown aborted | STARTUP | 1 | INFO | |
New REDO log started | STARTUP | 10 | INFO | GCI keep X, newest restorable GCI Y |
New log started | STARTUP | 10 | INFO | Log part X, start MB Y, stop MB Z |
Undo records executed | STARTUP | 15 | INFO | |
Completed copying of dictionary information | NODERESTART | 8 | INFO | |
Completed copying distribution information | NODERESTART | 8 | INFO | |
Starting to copy fragments | NODERESTART | 8 | INFO | |
Completed copying a fragment | NODERESTART | 10 | INFO | |
Completed copying all fragments | NODERESTART | 8 | INFO | |
Node failure phase completed | NODERESTART | 8 | ALERT | Reports node failure phases |
Node has failed, node state was X | NODERESTART | 8 | ALERT | Reports that a node has failed |
Report whether an arbitrator is found or not | NODERESTART | 6 | INFO | 7 different cases |
- President restarts arbitration thread [state=X] | ||||
- Prepare arbitrator node X [ticket=Y] | ||||
- Receive arbitrator node X [ticket=Y] | ||||
- Started arbitrator node X [ticket=Y] | ||||
- Lost arbitrator node X - process failure [state=Y] | ||||
- Lost arbitrator node X - process exit [state=Y] | ||||
- Lost arbitrator node X <error msg>[state=Y] | ||||
Report arbitrator results | NODERESTART | 2 | ALERT | 8 different results |
- Arbitration check lost - less than 1/2 nodes left | ||||
- Arbitration check won - node group majority | ||||
- Arbitration check lost - missing node group | ||||
- Network partitioning - arbitration required | ||||
- Arbitration won - positive reply from node X | ||||
- Arbitration lost - negative reply from node X | ||||
- Network partitioning - no arbitrator available | ||||
- Network partitioning - no arbitrator configured | ||||
GCP take over started | NODERESTART | 7 | INFO | |
GCP take over completed | NODERESTART | 7 | INFO | |
LCP take over started | NODERESTART | 7 | INFO | |
LCP take completed (state = X) | NODERESTART | 7 | INFO | |
Report transaction statistics | STATISTICS | 8 | INFO | # of: transactions, commits, reads, simple reads, writes, concurrent operations, attribute info, aborts |
Report operations | STATISTICS | 8 | INFO | # of operations |
Report table create | STATISTICS | 7 | INFO | |
Report job scheduling statistics | STATISTICS | 9 | INFO | Mean Internal job scheduling statistics |
Sent # of bytes | STATISTICS | 9 | INFO | Mean # of bytes sent to node X |
Received # of bytes | STATISTICS | 9 | INFO | Mean # of bytes received from node X |
Memory usage | STATISTICS | 5 | INFO | Data and Index memory usage (80%, 90% and 100%) |
Transporter errors | ERROR | 2 | ERROR | |
Transporter warnings | ERROR | 8 | WARNING | |
Missed heartbeats | ERROR | 8 | WARNING | Node X missed heartbeat # Y |
Dead due to missed heartbeat | ERROR | 8 | ALERT | Node X declared dead due to missed heartbeat |
General warning events | ERROR | 2 | WARNING | |
Sent heartbeat | INFO | 12 | INFO | Heartbeat sent to node X |
Create log bytes | INFO | 11 | INFO | Log part, log file, MB |
General info events | INFO | 2 | INFO |
An event report has the following format in the logs:
<date & time in GMT> [<any string>] <event severity> -- <log message> 09:19:30 2003-04-24 [NDB] INFO -- Node 4 Start phase 4 completed
Single user mode allows the database administrator to restrict access to the database system to only one application (API node). When entering single user mode all connections to all APIs will be gracefully closed and no transactions are allowed to be started. All running transactions are aborted.
When the cluster has entered single user mode (use the all status command to see when the state has entered the single user mode), only the allowed API node is granted access to the database.
Example:
ENTER SINGLE USER MODE 5
After executing this command and after cluster has entered the single user mode, the API node with node id 5 becomes the single user of the cluster.
The node specified in the command above must be a MySQL Server node. Any attempt to specify any other type of node will be rejected.
Note: if the node with id 5 is running when executing ENTER SINGLE USER MODE 5, all transactions running on node 5 will be aborted, the connection is closed, and the server must be restarted.
The command EXIT SINGLE USER MODE alters the state of the cluster DB nodes from “single user mode” to “started”. MySQL Servers waiting for a connection, i.e. for the cluster to become ready, are now allowed to connect. The MySQL Server denoted as the single user continues to run, if it is connected, during and after the state transition.
Example:
EXIT SINGLE USER MODE
Best practice in case of node failures when running in single user mode is to:
Finish all single user mode transactions
Issue the command exit single user mode
Restart database nodes
Or restart database nodes prior to entering single user mode.
This section describes how to create a backup and later restore the backup to a database.
A backup is a snapshot of the database at a given time. The backup contains three main parts:
Meta data (what tables exists etc)
Table records (data in tables)
A log of committed transactions
Each of these parts is saved on all nodes participating in a backup.
During backup each node saves these three parts to disk into three files:
The control file which contain control information and meta data.
The data file that contain the table records.
The log file that contain the committed transactions.
Above <BackupId> is an identifier for the backup and <NodeId> is the node id of the node creating the file.
The meta data consists of table definitions. All nodes have the exact same table definitions saved on disk.
The table records are saved per fragment. Each fragment contains a header that describes which table the records belong to. After the list of records there is a footer that contains a checksum for the records. Different nodes save different fragments during the backup.
The committed log contains committed transaction made during the backup. Only transactions on tables stored in the backup are stored in the log. The different nodes in the backup saves different log records as they host different database fragments.
Before starting make sure that the cluster is properly configured for backups.
Start management server.
Execute the command START BACKUP.
The management server will reply with a message “Start of backup ordered”. This means that the management server has submitted the request to the cluster, but has not yet received any response.
The management server will reply “Backup <BackupId> started” where <BackupId> is the unique identifier for this particular backup. This will also be saved in the cluster log (if not configured otherwise). This means that the cluster has received and processed the backup request. It does not mean that the backup has completed.
The management server will when the backup is finished reply “Backup <BackupId> completed”.
Using the management server to abort a backup:
Start management server.
Execute the command ABORT BACKUP <BACKUPID>. The number <BackupId> is the identifier of the backup that is included in the response of the management server when the backup is started, i.e. in the message “Backup <BackupId> started”. The identifier is also saved in the cluster log (cluster.log).
The management server will reply “Abort of backup <BackupId> ordered” This means that it has submitted the request to the cluster, but has not received any response.
The management server will reply “Backup <BackupId> has been aborted reason XYZ”. This means that the cluster has aborted the backup and removed everything belonging to it, including the files in the file system.
Note that if there is not any backup with id <BackupId> running when it is aborted, the management server will not reply anything. However there will be a line in the cluster.log mentioning that an “invalid” abort command has been filed.
The restore program is implemented as seperate command line utility. It reads the files created from the backup and inserts the stored information into the database. The restore program has to be executed once for each set of backup files, i.e. as many times as there were database nodes running when the backup we created.
The first time you run the restore program you also need to restore the meta data, i.e. create tables. The restore program acts as an API to the cluster and therefore needs a free connection to connect to. This can be verified with the ndb_mgm command SHOW. In the same way as with other API nodes, e.g. the mysqld, the Ndb.cfg, the NDB_CONNECTSTRING or the switch -c <connectstring> may be used to locate the MGM node. The backup files must be present in the directory given as an argument to the restore program. The backup can be restored to a database with a different configuration than it was created from. For example, consider if a backup (with id 12) created in a cluster with two database nodes (with node id 2 and node id 3) that should be restored to a cluster with four nodes. The restore program then has to be executed two times (one for each database node in the cluster where the backup was taken) as described in the box below.
Note: for rapid restore, the data may be restored in parallell (provided that there are enough free API connections available). Note however that the data files must always be applied before the logs.
Note: the cluster should have an empty database when starting to restore a backup.
There are four configuration parameters for backup:
Amount of memory (out of the total memory) used to buffer data before it is written to disk.
Amount of memory (out of the total memory) used to buffer log records before these are written to disk.
Total memory allocated in a database node for backups. This should be the sum of the memory allocated for the two buffers.
Size of blocks written to disk. This applies for both the data buffer and the log buffer.