This chapter contains the following:
You should perform a database backup whenever you want to save the database and be able to restore it to the current state at a later point.
You can use the following methods to restore the database:
If the database is accidentally deleted from a server-capable administration node, use the fs2d daemon to replicate the database from another server-capable administration node. See “Restoring a Deleted Database from Another Node”.
If you want to be able to recreate the current configuration, use the config command in cxfs_admin. You can then recreate this configuration by using the output file and the cxfs_admin -f option or running the script generated as described in “Saving and Recreating the Current Configuration with cxfs_admin” in Chapter 11.
If you want to retain a copy of the database and all node-specific information such as local logging, use the cdbBackup and cdbRestore commands. You should periodically backup the cluster database on all server-capable administration nodes using the cdbBackup command either manually or by adding an entry to the root crontab file. See “Using cdbBackup and cdbRestore for the Cluster Database and Logging Information”.
This section discusses the following:
If the database has been accidentally deleted from an individual server-capable administration node, you can restore it by synchronizing with the database on another server-capable administration node.
| Note: Do not use this method if the cluster database has been corrupted, because the database on another node will also be corrupted. In the case of corruption, you must reinstate a backup copy of the database. See “Saving and Recreating the Current Configuration with cxfs_admin” in Chapter 11. |
Do the following:
Stop the CXFS service on the server-capable administration node with the deleted database by running the following command in cxfs_admin :
| Caution: If you omit this step, the target node might be reset by another server-capable administration node. |
cxfs_admin:cluster> disable nodename |
(If running GRIOv2) Stop the GRIOv2 daemon (ggd2) by running the following command on the node with the deleted database:
server-admin# service grio2 stop |
Stop the CXFS control daemon (clconfd ) by running the following command on the node where step 2 was executed:
| Caution: Running this command will completely shut down all CXFS filesystem access on the local node. |
server-admin# service cxfs stop |
Stop the CXFS cluster administration daemons (cad, cmond, crsd, and fs2d) by running the following command on the node where step 2 was executed:
server-admin# service cxfs_cluster stop |
Verify that all of the filesystems have been unmounted and the cluster membership has been terminated. To do this, run clconf_info on the active metadata server.
Run cdbreinit on the node where step 2 was executed:
server-admin# /usr/cluster/bin/cdbreinit |
Wait for the following message to be logged to the syslog :
fs2d[PID]: Finished receiving CDB sync series from machine nodename |
Start the CXFS control daemon by running the following command on the node where step 2 was executed:
server-admin# service cxfs start |
(If running GRIOv2) Start the GRIOv2 daemon (ggd2) by running the following command on the node where step 2 was executed:
server-admin# service grio2 start |
You can choose to have the cdbreinit command restart cluster daemons . The fs2d daemon will then replicate the cluster database to the node from which it is missing.
The cdbBackup and cdbRestore commands backup and restore the cluster database and node-specific information, such as local logging information. You must run these commands individually for each server-capable administration node.
To perform a backup of the cluster, use the cdbBackup command on each server-capable administration node:
# cdbBackup |
| Caution: Do not make configuration changes while you are using the cdbBackup command. |
To perform a restore, run the cdbRestore command on each server-capable administration node. You can use this method for either a missing or a corrupted cluster database. Do the following:
Stop CXFS services on all nodes in the cluster.
Stop the cluster administration daemons on each server-capable administration node.
Remove the old database by using the cdbreinit command on each server-capable administration node.
Stop the cluster administration daemons again (these were restarted automatically by cdbreinit in the previous step) on each node.
Delete the existing database files by running the cdbdelete command on each server-capable administration command:
# cdbdelete /var/cluster/cdb/cdb.db |
Use the cdbRestore command on each server-capable administration node:
# cdbRestore -f backup_filename |
| Note: The name of the backup file (backup_filename
) created by cdbBackup is displayed in
the cdbBackup output.
For example, the following shows that cdb_cc-xe.Oct.27.2008.08:58:17.tar.Z is the value for backup_filename (line breaks shown here for readability):
|
Start the cluster administration daemons on each server-capable administration node.
For example, to backup the current database, clear the database, and then restore the database to all server-capable administration nodes, do the following as directed:
On each server-capable administration node: # /usr/cluster/bin/cdbBackup On one server-capable administration node: cmgr> stop cx_services for cluster clusterA On each server-capable administration node (if running GRIOv2): # service grio2 stop On each server-capable administration node: # service cxfs stop On each server-capable administration node: # service cxfs_cluster stop On each server-capable administration node: # /usr/cluster/bin/cdbreinit On each node server-capable administration (again): # service cxfs_cluster stop On each server-capable administration node: # /usr/cluster/bin/cdbdelete /var/cluster/cdb/cdb.db On each server-capable administration node: # service cxfs_cluster start On each server-capable administration node: # /usr/cluster/bin/cdbRestore -f backup_filename On each server-capable administration node: # service cxfs_cluster start On each server-capable administration node: # service cxfs start On each server-capable administration node (if running GRIOv2): # service grio2 start |
For more information, see the cdbBackup (1M), cdbdelete(1M), and cdbRestore(1M) man pages.
The cxfs-config command validates configuration information in the cluster database. You can run it on any server-capable administration node in the cluster.
By default, cxfs-config displays the following:
Cluster name and cluster ID
Tiebreaker node
Networks for CXFS failover networks
Nodes in the pool:
CXFS filesystems:
Name, mount point (enabled means that the filesystem is configured to be mounted; if it is not mounted, there is an error)
Device name
Mount options
Potential metadata servers
Nodes that should have the filesystem mounted (if there are no errors)
Switches:
Switch name, user name to use when sending a telnet message, mask (a hexadecimal string representing a 64-bit port bitmap that indicates the list of ports in the switch that will not be fenced)
Ports on the switch that have a client configured for fencing at the other end
Warnings or errors
For example:
thump# /usr/cluster/bin/cxfs-config
Global:
cluster: topiary (id 1)
cluster state: enabled
tiebreaker: <none>
Networks:
net 0: type tcpip 192.168.0.0 255.255.255.0
net 1: type tcpip 134.14.54.0 255.255.255.0
Machines:
node leesa: node 6 cell 1 enabled Linux32 client_only
hostname: leesa.example.com
fail policy: Fence
nic 0: address: 192.168.0.164 priority: 1 network: 0
nic 1: address: 134.14.54.164 priority: 2 network: 1
node thump: node 1 cell 0 enabled Linux64 server_admin
hostname: thump.example.com
fail policy: Fence
nic 0: address: 192.168.0.186 priority: 1 network: 0
nic 1: address: 134.14.54.186 priority: 2 network: 1
Filesystems:
fs dxm: /mnt/dxm enabled
device = /dev/cxvm/tp9500a4s0
options = []
servers = thump (0)
clients = leesa, thump
Switches:
switch 0: admin@asg-fcsw1 mask 0000000000000000
port 8: 210000e08b0ead8c thump
switch 1: admin@asg-fcsw0 mask 0000000000000000
Warnings/errors:
enabled machine leesa has fencing enabled but is not present in switch database |
The following options are of particular interest:
-all lists all available information
-ping contacts each NIC in the machine list and displays if the packets is transmitted and received. For example:
node leesa: node 6 cell 1 enabled Linux32 client_only
fail policy: Fence
nic 0: address: 192.168.0.164 priority: 1
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.477/0.666/1.375 ms
nic 1: address: 134.14.54.164 priority: 2
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.469/0.645/1.313 ms |
-xfs lists XFS information for each CXFS filesystem, such as size. For example:
Filesystems:
fs dxm: /mnt/dxm enabled
device = /dev/cxvm/tp9500a4s0
options = []
servers = thump (0)
clients = leesa, thump
xfs:
magic: 0x58465342
blocksize: 4096
uuid: 3459ee2e-76c9-1027-8068-0800690dac3c
data size 17.00 Gb |
-xvm lists XVM information for each CXFS filesystem, such as volume size and topology. For example:
Filesystems:
fs dxm: /mnt/dxm enabled
device = /dev/cxvm/tp9500a4s0
options = []
servers = thump (0)
clients = leesa, thump
xvm:
vol/tp9500a4s0 0 online,open
subvol/tp9500a4s0/data 35650048 online,open
slice/tp9500a4s0 35650048 online,open
data size: 17.00 Gb |
-check performs extra verification, such as XFS filesystem size with XVM volume size for each CXFS filesystem. This option may take a few moments to execute.
For a complete list of options, see the cxfs-config(1m) man page.
The following example shows errors reported by cxfs-config :
aiden # /usr/cluster/bin/cxfs-config -check -all
Global:
cluster: BP (id 555)
cluster state: enabled
tiebreaker:
Networks:
net 0: type tcpip 10.11.0.0 255.255.255.0
net 1: type tcpip 128.162.242.0 255.255.255.0
Machines:
node aiden: node 27560 cell 0 enabled Linux64 server_admin
hostname: aiden.example.com
fail policy: Fence, Shutdown
nic 0: address: 10.11.0.241 priority: 1 network: 0
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.136/0.171/0.299 ms
nic 1: address: 128.162.242.12 priority: 2 network: 1
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.130/0.171/0.303 ms
node brigid: node 31867 cell 2 enabled Linux64 server_admin
hostname: brigid.example.com
fail policy: Fence, Shutdown
nic 0: address: 10.11.0.240 priority: 1 network: 0
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.303/0.339/0.446 ms
nic 1: address: 128.162.242.11 priority: 2 network: 1
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.336/0.430/0.799 ms
node flynn: node 1 cell 1 enabled Linux64 client_only
hostname: flynn.example.com
fail policy: Fence, Shutdown
nic 0: address: 10.11.0.234 priority: 1 network: 0
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.323/0.370/0.539 ms
nic 1: address: 128.162.242.189 priority: 2 network: 1
ping: 5 packets transmitted, 5 packets received, 0.0% packet loss
ping: round-trip min/avg/max = 0.283/0.312/0.424 ms
Filesystems:
fs concatfs: /concatfs enabled
device = /dev/cxvm/concatfs
force = true
options = [rw,quota]
servers = aiden (0), brigid (2)
clients = aiden, brigid, flynn
xvm:
vol/concatfs 0 online,open
subvol/concatfs/data 2836134016 online,open
subvol/concatfs/data 2836134016 online,open
concat/concat0 2836134016 online,tempname,open
slice/lun2s0 1418067008 online,open
slice/lun3s0 1418067008 online,open
data size: 1.32 TB
xfs:
magic: 0x58465342
blocksize: 4096
uuid: 9616ae39-3a50-1029-8896-080069056bf5
data size 1.32 TB
fs stripefs: /stripefs enabled
device = /dev/cxvm/stripefs
force = true
options = [rw,quota]
servers = aiden (0), brigid (2)
clients = aiden, brigid, flynn
xvm:
vol/stripefs 0 online,open
subvol/stripefs/data 2836133888 online,open
stripe/stripe0 2836133888 online,tempname,open
slice/lun0s0 1418067008 online,open
slice/lun1s0 1418067008 online,open
data size: 1.32 TB
xfs:
magic: 0x58465342
blocksize: 4096
uuid: 9616ae38-3a50-1029-8896-080069056bf5
data size 1.32 TB
Switches:
switch 0: 32 port brocade admin@fcswitch12 port 12: 210000e08b00e6eb brigid
port 28: 210000e08b041a3a aiden
switch 1: 32 port brocade admin@fcswitch13 port 7: 210000e08b08793f flynn
port 12: 210100e08b28793f flynn
cxfs-config warnings/errors:
server aiden fail policy must not contain "Shutdown" for cluster with
even number of enabled servers and no tiebreaker
server brigid fail policy must not contain "Shutdown" for cluster with
even number of enabled servers and no tiebreaker |