Management of MySQL Cluster

Management of MySQL Cluster
Prev	Chapter 17. MySQL Cluster	Next

Managing a MySQL Cluster involves a number of activities. The first activity is to configure and startup MySQL Cluster. This is covered by the sections the section called “MySQL Cluster Configuration” and the section called “Process Management in MySQL Cluster”. This section covers how to manage a running MySQL Cluster.

There are essentially two ways of actively managing a running MySQL Cluster. The first is by commands entered into the management client where status of cluster can be checked, log levels changed, backups started and stopped and nodes can be stopped and started. The second method is to study the output in the cluster log. The cluster log is directed towards the ndb_2_cluster.log in the DataDir directory of the management server. The cluster log contains event reports generated from the ndbd processes in the cluster. It is also possible to send the cluster log entries to a Unix system log.

Commands in the Management Client

In addition to the central configuration file, the cluster may also be controlled through a command line interface. The command line interface is available through a separate management client process. This is the main administrative interface to a running cluster.

The management client has the following basic commands. Below, <id> denotes either a database node id (e.g. 21) or the keyword ALL that indicates that the command should be applied to all database nodes in the cluster.

HELP: Prints information on all available commands.
SHOW: Prints information on the status of the cluster.
<id> START: Start a database node identified with <id> or all database nodes.
<id> STOP: Stop a database node identified with <id> or all database nodes.
<id> RESTART [-N] [-I]: Restart a database node identified with <id> or all database nodes.
<id> STATUS: Displays status information for database node identified with <id> (or ALL database nodes).
ENTER SINGLE USER MODE <id>: Enters single user mode where only the API with node <id> is allowed to access the database system.
EXIT SINGLE USER MODE: Exists single user mode allowing all APIs to access the database system.
QUIT: Terminates the management client.
SHUTDOWN: Shuts down all cluster nodes (except mysql servers) and exits.

Commands for the event logs are given in the next section and commands for backup and restore are given in a separate section on these topics.

Event Reports Generated in MySQL Cluster

MySQL Cluster has two event logs, the cluster log and the node log.

The cluster log is a log of the whole cluster and this log can have multiple destinations (file, management server console window or syslog).
The node log is a log that is local to each database node and is written to the console window of the database node. The two logs can be set to log different subsets of the list of events.

Note: The cluster log is the recommended log. The node log is only intended to be used during application development or for debugging application code.

Each reportable event has the following properties:

Category (STARTUP, SHUTDOWN, STATISTICS, CHECKPOINT, NODERESTART, CONNECTION, ERROR, INFO)
Priority (1-15 where 1 - Most important, 15 - Least important)
Severity (ALERT, CRITICAL, ERROR, WARNING, INFO, DEBUG)

The two logs (the cluster log and the node log) can be filtered on these properties.

Logging Management Commands

The following management commands are related to the cluster log:

CLUSTERLOG ON: Turn cluster log on.
CLUSTERLOG OFF: Turn cluster log off.
CLUSTERLOG INFO: Information about cluster log settings.
<id> CLUSTERLOG <category>=<threshold>: Log category events with priority less than or equal to threshold in the cluster log.
CLUSTERLOG FILTER <severity>: Toggles cluster logging of the specified severity type on/off.

The following table describes the default setting (for all database nodes) of the cluster log category threshold. If an event has a priority with a value lower than or equal to the priority threshold, then it is reported in the cluster log.

Note that the events are reported per database node and that the thresholds can be set differently on different nodes.

Category	Default threshold (All database nodes)
STARTUP	7
SHUTDOWN	7
STATISTICS	7
CHECKPOINT	7
NODERESTART	7
CONNECTION	7
ERROR	15
INFO	7

The threshold is used to filter events within each category. For example: a STARTUP event with a priority of 3 is never sent unless the threshold for STARTUP is changed to 3 or lower. Only events with priority 3 or lower are sent if the threshold is 3. The event severities are (corresponds to UNIX syslog levels):

1	ALERT	A condition that should be corrected immediately, such as a corrupted system database
2	CRITICAL	Critical conditions, such as device errors or out of resources
3	ERROR	Conditions that should be corrected, such as configuration errors
4	WARNING	Conditions that are not error conditions but might require handling
5	INFO	Informational messages
6	DEBUG	Messages used during development of NDB Cluster

Syslog's LOG_EMERG and LOG_NOTICE are not used/mapped.

The event severities can be turned on or off. If the severity is on then all events with priority less than or equal to the category thresholds are logged. If the severity is off then no events belonging to the severity are logged.

The following commands are related to the node log:

<id> LOGLEVEL <levelnumber> Set logging level for database process with id to the value of <levelnumber>.

Log Events

All reportable events are listed below.

Event	Category	Priority	Severity	Description
DB nodes connected	CONNECTION	8	INFO
DB nodes disconnected	CONNECTION	8	INFO
Communication closed	CONNECTION	8	INFO	API & DB nodes connection closed
Communication opened	CONNECTION	8	INFO	API & DB nodes connection opened
Global checkpoint started	CHECKPOINT	9	INFO	Start of a GCP, i.e., REDO log is written to disk
Global checkpoint completed	CHECKPOINT	10	INFO	GCP finished
Local checkpoint started	CHECKPOINT	7	INFO	Start of local check pointing, i.e., data is written to disk. LCP Id and GCI Ids (keep and oldest restorable)
Local checkpoint completed	CHECKPOINT	8	INFO	LCP finished
LCP stopped in calc keep GCI	CHECKPOINT	0	ALERT	LCP stopped!
Local checkpoint fragment completed	CHECKPOINT	11	INFO	A LCP on a fragment has been completed
Report undo log blocked	CHECKPOINT	7	INFO	Reports undo logging blocked due buffer near to overflow
DB node start phases initiated	STARTUP	1	INFO	NDB Cluster starting
DB node all start phases completed	STARTUP	1	INFO	NDB Cluster started
Internal start signal received STTORRY	STARTUP	15	INFO	Internal start signal to blocks received after restart finished
DB node start phase X completed	STARTUP	4	INFO	A start phase has completed
Node has been successfully included into the cluster	STARTUP	3	INFO	President node, own node and dynamic id is shown
Node has been refused to be included into the cluster	STARTUP	8	INFO
DB node neighbours	STARTUP	8	INFO	Show left and right DB nodes neighbours
DB node shutdown initiated	STARTUP	1	INFO
DB node shutdown aborted	STARTUP	1	INFO
New REDO log started	STARTUP	10	INFO	GCI keep X, newest restorable GCI Y
New log started	STARTUP	10	INFO	Log part X, start MB Y, stop MB Z
Undo records executed	STARTUP	15	INFO
Completed copying of dictionary information	NODERESTART	8	INFO
Completed copying distribution information	NODERESTART	8	INFO
Starting to copy fragments	NODERESTART	8	INFO
Completed copying a fragment	NODERESTART	10	INFO
Completed copying all fragments	NODERESTART	8	INFO
Node failure phase completed	NODERESTART	8	ALERT	Reports node failure phases
Node has failed, node state was X	NODERESTART	8	ALERT	Reports that a node has failed
Report whether an arbitrator is found or not	NODERESTART	6	INFO	7 different cases
				- President restarts arbitration thread [state=X]
				- Prepare arbitrator node X [ticket=Y]
				- Receive arbitrator node X [ticket=Y]
				- Started arbitrator node X [ticket=Y]
				- Lost arbitrator node X - process failure [state=Y]
				- Lost arbitrator node X - process exit [state=Y]
				- Lost arbitrator node X <error msg>[state=Y]
Report arbitrator results	NODERESTART	2	ALERT	8 different results
				- Arbitration check lost - less than 1/2 nodes left
				- Arbitration check won - node group majority
				- Arbitration check lost - missing node group
				- Network partitioning - arbitration required
				- Arbitration won - positive reply from node X
				- Arbitration lost - negative reply from node X
				- Network partitioning - no arbitrator available
				- Network partitioning - no arbitrator configured
GCP take over started	NODERESTART	7	INFO
GCP take over completed	NODERESTART	7	INFO
LCP take over started	NODERESTART	7	INFO
LCP take completed (state = X)	NODERESTART	7	INFO
Report transaction statistics	STATISTICS	8	INFO	# of: transactions, commits, reads, simple reads, writes, concurrent operations, attribute info, aborts
Report operations	STATISTICS	8	INFO	# of operations
Report table create	STATISTICS	7	INFO
Report job scheduling statistics	STATISTICS	9	INFO	Mean Internal job scheduling statistics
Sent # of bytes	STATISTICS	9	INFO	Mean # of bytes sent to node X
Received # of bytes	STATISTICS	9	INFO	Mean # of bytes received from node X
Memory usage	STATISTICS	5	INFO	Data and Index memory usage (80%, 90% and 100%)
Transporter errors	ERROR	2	ERROR
Transporter warnings	ERROR	8	WARNING
Missed heartbeats	ERROR	8	WARNING	Node X missed heartbeat # Y
Dead due to missed heartbeat	ERROR	8	ALERT	Node X declared dead due to missed heartbeat
General warning events	ERROR	2	WARNING
Sent heartbeat	INFO	12	INFO	Heartbeat sent to node X
Create log bytes	INFO	11	INFO	Log part, log file, MB
General info events	INFO	2	INFO

An event report has the following format in the logs:

<date & time in GMT> [<any string>] <event severity> -- <log message>

09:19:30 2003-04-24 [NDB] INFO -- Node 4 Start phase 4 completed

Single User Mode

Single user mode allows the database administrator to restrict access to the database system to only one application (API node). When entering single user mode all connections to all APIs will be gracefully closed and no transactions are allowed to be started. All running transactions are aborted.

When the cluster has entered single user mode (use the all status command to see when the state has entered the single user mode), only the allowed API node is granted access to the database.

Example:

ENTER SINGLE USER MODE 5

After executing this command and after cluster has entered the single user mode, the API node with node id 5 becomes the single user of the cluster.

The node specified in the command above must be a MySQL Server node. Any attempt to specify any other type of node will be rejected.

Note: if the node with id 5 is running when executing ENTER SINGLE USER MODE 5, all transactions running on node 5 will be aborted, the connection is closed, and the server must be restarted.

The command EXIT SINGLE USER MODE alters the state of the cluster DB nodes from “single user mode” to “started”. MySQL Servers waiting for a connection, i.e. for the cluster to become ready, are now allowed to connect. The MySQL Server denoted as the single user continues to run, if it is connected, during and after the state transition.

Example:

EXIT SINGLE USER MODE

Best practice in case of node failures when running in single user mode is to:

Finish all single user mode transactions
Issue the command exit single user mode
Restart database nodes

Or restart database nodes prior to entering single user mode.

On-line Backup of MySQL Cluster

This section describes how to create a backup and later restore the backup to a database.

Cluster Backup Concepts

A backup is a snapshot of the database at a given time. The backup contains three main parts:

Meta data (what tables exists etc)
Table records (data in tables)
A log of committed transactions

Each of these parts is saved on all nodes participating in a backup.

During backup each node saves these three parts to disk into three files:

BACKUP-<BackupId>.<NodeId>.ctl: The control file which contain control information and meta data.
BACKUP-<BackupId>-0.<NodeId>.data: The data file that contain the table records.
BACKUP-<BackupId>.<NodeId>.log: The log file that contain the committed transactions.

Above <BackupId> is an identifier for the backup and <NodeId> is the node id of the node creating the file.

Meta data: The meta data consists of table definitions. All nodes have the exact same table definitions saved on disk.
Table records: The table records are saved per fragment. Each fragment contains a header that describes which table the records belong to. After the list of records there is a footer that contains a checksum for the records. Different nodes save different fragments during the backup.
Committed log: The committed log contains committed transaction made during the backup. Only transactions on tables stored in the backup are stored in the log. The different nodes in the backup saves different log records as they host different database fragments.

Using The Management Server to Create a Backup

Before starting make sure that the cluster is properly configured for backups.

Start management server.
Execute the command START BACKUP.
The management server will reply with a message “Start of backup ordered”. This means that the management server has submitted the request to the cluster, but has not yet received any response.
The management server will reply “Backup <BackupId> started” where <BackupId> is the unique identifier for this particular backup. This will also be saved in the cluster log (if not configured otherwise). This means that the cluster has received and processed the backup request. It does not mean that the backup has completed.
The management server will when the backup is finished reply “Backup <BackupId> completed”.

Using the management server to abort a backup:

Start management server.
Execute the command ABORT BACKUP <BACKUPID>. The number <BackupId> is the identifier of the backup that is included in the response of the management server when the backup is started, i.e. in the message “Backup <BackupId> started”. The identifier is also saved in the cluster log (cluster.log).
The management server will reply “Abort of backup <BackupId> ordered” This means that it has submitted the request to the cluster, but has not received any response.
The management server will reply “Backup <BackupId> has been aborted reason XYZ”. This means that the cluster has aborted the backup and removed everything belonging to it, including the files in the file system.

Note that if there is not any backup with id <BackupId> running when it is aborted, the management server will not reply anything. However there will be a line in the cluster.log mentioning that an “invalid” abort command has been filed.

How to Restore a Cluster Backup

The restore program is implemented as seperate command line utility. It reads the files created from the backup and inserts the stored information into the database. The restore program has to be executed once for each set of backup files, i.e. as many times as there were database nodes running when the backup we created.

The first time you run the restore program you also need to restore the meta data, i.e. create tables. The restore program acts as an API to the cluster and therefore needs a free connection to connect to. This can be verified with the ndb_mgm command SHOW. In the same way as with other API nodes, e.g. the mysqld, the Ndb.cfg, the NDB_CONNECTSTRING or the switch -c <connectstring> may be used to locate the MGM node. The backup files must be present in the directory given as an argument to the restore program. The backup can be restored to a database with a different configuration than it was created from. For example, consider if a backup (with id 12) created in a cluster with two database nodes (with node id 2 and node id 3) that should be restored to a cluster with four nodes. The restore program then has to be executed two times (one for each database node in the cluster where the backup was taken) as described in the box below.

Note: for rapid restore, the data may be restored in parallell (provided that there are enough free API connections available). Note however that the data files must always be applied before the logs.

Note: the cluster should have an empty database when starting to restore a backup.

Configuration for Cluster Backup

There are four configuration parameters for backup:

BackupDataBufferSize: Amount of memory (out of the total memory) used to buffer data before it is written to disk.
BackupLogBufferSize: Amount of memory (out of the total memory) used to buffer log records before these are written to disk.
BackupMemory: Total memory allocated in a database node for backups. This should be the sum of the memory allocated for the two buffers.
BackupWriteSize: Size of blocks written to disk. This applies for both the data buffer and the log buffer.

Backup Troubleshooting

If an error code is returned when issuing a backup request, then check that there is enough memory allocated for the backup (i.e. the configuration parameters). Also check that there is enough space on the hard drive partition of the backup target.

Prev	Up	Next
Process Management in MySQL Cluster	Home	Using High-Speed Interconnects with MySQL Cluster