Chapter 13. Cluster Database Management

Chapter 13. Cluster Database Management
Prev		Next

This chapter contains the following:

Performing Cluster Database Backup and Restoration

You should perform a database backup whenever you want to save the database and be able to restore it to the current state at a later point.

You can use the following methods to restore the database:

If the database is accidentally deleted from a server-capable administration node, use the fs2d daemon to replicate the database from another server-capable administration node. See “Restoring a Deleted Database from Another Node”.
If you want to be able to recreate the current configuration, use the config command in cxfs_admin. You can then recreate this configuration by using the output file and the cxfs_admin -f option or running the script generated as described in “Saving and Recreating the Current Configuration with cxfs_admin” in Chapter 11.
If you want to retain a copy of the database and all node-specific information such as local logging, use the cdbBackup and cdbRestore commands. You should periodically backup the cluster database on all server-capable administration nodes using the cdbBackup command either manually or by adding an entry to the root crontab file. See “Using cdbBackup and cdbRestore for the Cluster Database and Logging Information”.

This section discusses the following:

Restoring a Deleted Database from Another Node

If the database has been accidentally deleted from an individual server-capable administration node, you can restore it by synchronizing with the database on another server-capable administration node.

Note: Do not use this method if the cluster database has been corrupted, because the database on another node will also be corrupted. In the case of corruption, you must reinstate a backup copy of the database. See “Saving and Recreating the Current Configuration with cxfs_admin” in Chapter 11.

Do the following:

Stop the CXFS service on the server-capable administration node with the deleted database by running the following command in cxfs_admin:
Caution: If you omit this step, the target node might be reset by another server-capable administration node.
cxfs_admin:cluster> disable nodename
(If running GRIOv2) Stop the GRIOv2 daemon (ggd2) by running the following command on the node with the deleted database:
server-admin# service grio2 stop
Stop the CXFS control daemon (clconfd) by running the following command on the node where step 2 was executed:
Caution: Running this command will completely shut down all CXFS filesystem access on the local node.
server-admin# service cxfs stop
Stop the CXFS cluster administration daemons (cad, cmond, crsd, and fs2d) by running the following command on the node where step 2 was executed:
server-admin# service cxfs_cluster stop
Verify that all of the filesystems have been unmounted and the cluster membership has been terminated. To do this, run clconf_info on the active metadata server.
Run cdbreinit on the node where step 2 was executed:
server-admin# /usr/cluster/bin/cdbreinit

Wait for the following message to be logged to the syslog :

fs2d[PID]: Finished receiving CDB sync series from machine nodename

Start the CXFS control daemon by running the following command on the node where step 2 was executed:
server-admin# service cxfs start
(If running GRIOv2) Start the GRIOv2 daemon (ggd2) by running the following command on the node where step 2 was executed:
server-admin# service grio2 start

You can choose to have the cdbreinit command restart cluster daemons . The fs2d daemon will then replicate the cluster database to the node from which it is missing.

Using `cdbBackup` and `cdbRestore` for the Cluster Database and Logging Information

The cdbBackup and cdbRestore commands backup and restore the cluster database and node-specific information, such as local logging information. You must run these commands individually for each server-capable administration node.

To perform a backup of the cluster, use the cdbBackup command on each server-capable administration node:

# cdbBackup

Caution: Do not make configuration changes while you are using the cdbBackup command.

To perform a restore, run the cdbRestore command on each server-capable administration node. You can use this method for either a missing or a corrupted cluster database. Do the following:

Stop CXFS services on all nodes in the cluster.
Stop the cluster administration daemons on each server-capable administration node.
Remove the old database by using the cdbreinit command on each server-capable administration node.
Stop the cluster administration daemons again (these were restarted automatically by cdbreinit in the previous step) on each node.
Delete the existing database files by running the cdbdelete command on each server-capable administration command:
# cdbdelete /var/cluster/cdb/cdb.db

Use the cdbRestore command on each server-capable administration node:

# cdbRestore -f backup_filename

Note: The name of the backup file (backup_filename ) created by cdbBackup is displayed in the cdbBackup output.

For example, the following shows that cdb_cc-xe.Oct.27.2008.08:58:17.tar.Z is the value for backup_filename (line breaks shown here for readability):

# cdbBackup 
Saving cdb header file /var/cluster/cdb/cdb.db and 
  cdb data directory /var/cluster/cdb/cdb.db# as cdb 
  backup file cdb_cc-xe.Oct.27.2008.08:58:17.tar.Z 
  under directory /var/cluster/cdb-backup
...done.

Start the cluster administration daemons on each server-capable administration node.

For example, to backup the current database, clear the database, and then restore the database to all server-capable administration nodes, do the following as directed:

On each server-capable administration node:
# /usr/cluster/bin/cdbBackup

On one server-capable administration node: 
cmgr> stop cx_services for cluster clusterA

On each server-capable administration node (if running GRIOv2): 
# service grio2 stop

On each server-capable administration node: 
# service cxfs stop

On each server-capable administration node:
# service cxfs_cluster stop

On each server-capable administration node:
# /usr/cluster/bin/cdbreinit

On each node server-capable administration (again):
# service cxfs_cluster stop

On each server-capable administration node: 
# /usr/cluster/bin/cdbdelete /var/cluster/cdb/cdb.db

On each server-capable administration node:
# service cxfs_cluster start

On each server-capable administration node: 
# /usr/cluster/bin/cdbRestore -f backup_filename 

On each server-capable administration node:
# service cxfs_cluster start

On each server-capable administration node:
# service cxfs start

On each server-capable administration node (if running GRIOv2): 
# service grio2 start

For more information, see the cdbBackup (1M), cdbdelete(1M), and cdbRestore(1M) man pages.

Validating the Cluster Configuration with `cxfs-config`

The cxfs-config command validates configuration information in the cluster database. You can run it on any server-capable administration node in the cluster.

By default, cxfs-config displays the following:

Cluster name and cluster ID
Tiebreaker node
Networks for CXFS failover networks
Nodes in the pool:
- Node ID
- Cell ID (as assigned by the kernel when added to the cluster and stored in the cluster database)
- Status of CXFS services (configured to be enabled or disabled)
- Operating system
- Node function
CXFS filesystems:
- Name, mount point (enabled means that the filesystem is configured to be mounted; if it is not mounted, there is an error)
- Device name
- Mount options
- Potential metadata servers
- Nodes that should have the filesystem mounted (if there are no errors)
- Switches:
  - Switch name, user name to use when sending a telnet message, mask (a hexadecimal string representing a 64-bit port bitmap that indicates the list of ports in the switch that will not be fenced)
  - Ports on the switch that have a client configured for fencing at the other end
Warnings or errors

For example:

thump# /usr/cluster/bin/cxfs-config
Global:
    cluster: topiary (id 1)
    cluster state: enabled
    tiebreaker: <none>

Networks:
    net 0: type tcpip  192.168.0.0      255.255.255.0   
    net 1: type tcpip  134.14.54.0      255.255.255.0   
 
Machines:
    node leesa: node 6     cell 1  enabled  Linux32 client_only 
        hostname: leesa.example.com
        fail policy: Fence
        nic 0: address: 192.168.0.164 priority: 1 network: 0
        nic 1: address: 134.14.54.164 priority: 2 network: 1

    node thump: node 1     cell 0  enabled  Linux64    server_admin
        hostname: thump.example.com
        fail policy: Fence
        nic 0: address: 192.168.0.186 priority: 1 network: 0
        nic 1: address: 134.14.54.186 priority: 2 network: 1

Filesystems:
    fs dxm: /mnt/dxm             enabled 
        device = /dev/cxvm/tp9500a4s0
        options = []
        servers = thump (0)
        clients = leesa, thump

Switches:
    switch 0: admin@asg-fcsw1      mask 0000000000000000
        port 8: 210000e08b0ead8c thump
        
    switch 1: admin@asg-fcsw0      mask 0000000000000000

Warnings/errors:
    enabled machine leesa has fencing enabled but is not present in switch database

The following options are of particular interest:

-all lists all available information

-ping contacts each NIC in the machine list and displays if the packets is transmitted and received. For example:

node leesa: node 6     cell 1  enabled  Linux32 client_only 
   fail policy: Fence
   nic 0: address: 192.168.0.164 priority: 1 
       ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
       ping: round-trip min/avg/max = 0.477/0.666/1.375 ms
   nic 1: address: 134.14.54.164 priority: 2 
       ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
       ping: round-trip min/avg/max = 0.469/0.645/1.313 ms

-xfs lists XFS information for each CXFS filesystem, such as size. For example:

Filesystems:
    fs dxm: /mnt/dxm             enabled 
        device = /dev/cxvm/tp9500a4s0
        options = []
        servers = thump (0)
        clients = leesa, thump
        xfs:
            magic: 0x58465342
            blocksize: 4096
            uuid: 3459ee2e-76c9-1027-8068-0800690dac3c
            data size 17.00 Gb

-xvm lists XVM information for each CXFS filesystem, such as volume size and topology. For example:

Filesystems:
    fs dxm: /mnt/dxm             enabled 
        device = /dev/cxvm/tp9500a4s0
        options = []
        servers = thump (0)
        clients = leesa,  thump
        xvm:
            vol/tp9500a4s0                    0 online,open
                subvol/tp9500a4s0/data     35650048 online,open
                    slice/tp9500a4s0           35650048 online,open
            
            data size: 17.00 Gb

-check performs extra verification, such as XFS filesystem size with XVM volume size for each CXFS filesystem. This option may take a few moments to execute.

For a complete list of options, see the cxfs-config(1m) man page.

The following example shows errors reported by cxfs-config:

aiden # /usr/cluster/bin/cxfs-config -check -all
Global:
    cluster: BP (id 555)
    cluster state: enabled
    tiebreaker: 
Networks:
    net 0: type tcpip  10.11.0.0        255.255.255.0
    net 1: type tcpip  128.162.242.0    255.255.255.0

Machines:
    node aiden: node 27560 cell 0  enabled  Linux64    server_admin
        hostname: aiden.example.com
        fail policy: Fence, Shutdown
        nic 0: address: 10.11.0.241 priority: 1 network: 0
            ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
            ping: round-trip min/avg/max = 0.136/0.171/0.299 ms
        nic 1: address: 128.162.242.12 priority: 2 network: 1
            ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
            ping: round-trip min/avg/max = 0.130/0.171/0.303 ms

    node brigid: node 31867 cell 2  enabled  Linux64    server_admin
        hostname: brigid.example.com
        fail policy: Fence, Shutdown
        nic 0: address: 10.11.0.240 priority: 1 network: 0
            ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
            ping: round-trip min/avg/max = 0.303/0.339/0.446 ms
        nic 1: address: 128.162.242.11 priority: 2 network: 1
            ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
            ping: round-trip min/avg/max = 0.336/0.430/0.799 ms

    node flynn: node 1     cell 1  enabled  Linux64 client_only
        hostname: flynn.example.com
        fail policy: Fence, Shutdown
        nic 0: address: 10.11.0.234 priority: 1 network: 0
            ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
            ping: round-trip min/avg/max = 0.323/0.370/0.539 ms
        nic 1: address: 128.162.242.189 priority: 2 network: 1
            ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
            ping: round-trip min/avg/max = 0.283/0.312/0.424 ms

Filesystems:
    fs concatfs: /concatfs            enabled
        device = /dev/cxvm/concatfs
        force = true
        options = [rw,quota]
        servers = aiden (0), brigid (2)
        clients = aiden, brigid, flynn
        xvm:
            vol/concatfs                      0 online,open
                subvol/concatfs/data     2836134016 online,open
                subvol/concatfs/data     2836134016 online,open
                    concat/concat0           2836134016 online,tempname,open
                        slice/lun2s0             1418067008 online,open
                        slice/lun3s0             1418067008 online,open

            data size: 1.32 TB
        xfs:
            magic: 0x58465342
            blocksize: 4096
            uuid: 9616ae39-3a50-1029-8896-080069056bf5
            data size 1.32 TB

    fs stripefs: /stripefs            enabled
        device = /dev/cxvm/stripefs
        force = true
        options = [rw,quota]
        servers = aiden (0), brigid (2)
        clients = aiden, brigid, flynn
        xvm:
            vol/stripefs                      0 online,open
                subvol/stripefs/data     2836133888 online,open
                    stripe/stripe0           2836133888 online,tempname,open
                        slice/lun0s0             1418067008 online,open
                        slice/lun1s0             1418067008 online,open

            data size: 1.32 TB
        xfs:
            magic: 0x58465342
            blocksize: 4096
            uuid: 9616ae38-3a50-1029-8896-080069056bf5
            data size 1.32 TB

Switches:
    switch 0: 32 port brocade admin@fcswitch12         port 12: 210000e08b00e6eb brigid
        port 28: 210000e08b041a3a aiden

    switch 1: 32 port brocade admin@fcswitch13         port 7: 210000e08b08793f flynn
        port 12: 210100e08b28793f flynn

cxfs-config warnings/errors:
    server aiden fail policy must not contain "Shutdown" for cluster with
even number of enabled servers and no tiebreaker
    server brigid fail policy must not contain "Shutdown" for cluster with
even number of enabled servers and no tiebreaker

Prev	Table of Contents	Next
Chapter 12. Administration and Maintenance		Chapter 14. Monitoring Status