Chapter 12. Administration and Maintenance

Chapter 12. Administration and Maintenance
Prev		Next

This chapter discusses the following topics:

Administrative Tools

You will use one of the following tools for CXFS administration:

CXFS GUI connected to any server-capable administration node. See Chapter 10, “CXFS GUI”.
cxfs_admin: for most commands, a cxfs_admin instance running on any node in the cluster network that has admin access permission (see “Setting cxfs_admin Access Permissions” in Chapter 11). See Chapter 11, “cxfs_admin Command”.

Note: A few cxfs_admin commands, such as stop_cxfs, must be run from a cxfs_admin instance running on a server-capable administration node.

Precedence of Configuration Options

Figure 12-1 shows the order in which CXFS programs take their configuration options.

Figure 12-1. Precedence of Configuration Options

CXFS Release Versions and Rolling Upgrades

This section discusses the following:

Definition of a Rolling Upgrade

SGI lets you upgrade of a subset of nodes from CXFS X.anything to CXFS X.anything within the same major-release thread (X). This policy lets you to keep your cluster running and filesystems available during the temporary upgrade process.

Note: A rolling upgrade is not supported between major releases CXFS (X.anything to CXFS Y.anything ). For a major release, you must upgrade all nodes in the cluster at the same time.

To identify compatible CXFS releases, see the CXFS MultiOS Software Compatibility Matrix:

https://support.sgi.com/content_request/139840/index.html

After the upgrade process is complete, all nodes should be running the same release.

Importance of Upgrading All Servers First

You must upgrade all server-capable administration nodes to a given CXFS release before upgrading any client-only nodes (including DMF parallel data-mover nodes, which are CXFS client-only nodes); server-capable administration nodes must run the same or later release as client-only nodes. Operating a cluster with client-only nodes running a mixture of older and newer CXFS versions may result in a performance loss. Relocation to a server-capable administration node that is running an older CXFS version is not supported.

Although clients that are not upgraded might continue to function in the CXFS cluster without problems, new CXFS functionality may not be enabled until all clients are upgraded; SGI does not provide support for any CXFS problems encountered on the clients that are not upgraded.

Caution: In some cases, improper upgrading can also result in a loss of functionality. For example, if CXFS client-only nodes, CXFS edge-serving nodes, or DMF parallel data-mover nodes are updated first (before the active metadata server), those nodes might not be able to mount the CXFS filesystems.

Importance of An Accurate Cluster Database Across the Cluster Before Upgrading

It is important that every server-capable administration node in the cluster has an accurate cluster database, in order to avoid inadvertently implementing an old database during the rolling upgrade process.

Upgrading Licenses

CXFS 7.0 and later releases require 7.0 licenses. For details about licenses, see Chapter 5, “CXFS Licensing”.

To upgrade to a 7.0 or later release, do the following:

Obtain 7.0 licenses from SGI. See “Obtaining the Keys from SGI” in Chapter 5.

Edit the /etc/lk/keys.dat file on all server-capable administration nodes:

Add the new 7.0 license keys.

(Optional) Delete the prior licenses.

Note: Prior versions of CXFS will not function with the CXFS 7.0 license keys. If you are upgrading, you should keep your prior licenses available in the event that you must downgrade. If /etc/lk/keys/dat contains licenses from a prior release, they will be ignored after you upgrade to CXFS 7.0.

To verify the licenses, see “License Key Verification” in Chapter 5.

Follow the steps in “General Upgrade Procedure”.

General Upgrade Procedure

To upgrade a CXFS cluster, do the following:

Ensure all server-capable administration nodes are running the same previous software release.
Ensure that the configuration database is accurate on every server-capable administration node.

Note: If a given node has an inaccurate database, delete it by using cdbreinit(8) and then follow the instructions in “Restoring a Database from Another Node” in Chapter 13.
Upgrade the potential metadata server (say admin2) for a given filesystem. (See the release notes and Chapter 7, “Server-Capable Administration Node Installation”.) Then reboot the potential metadata server.

For the first server-capable administration node that is an active metadata server (say admin1), move all CXFS filesystems running on it to the potential metadata server (admin2), making the potential metadata server now the active metadata server for those filesystems. Do the following:

admin1# /sbin/chkconfig grio2 off (if using GRIO)
admin1# /sbin/chkconfig cxfs off
admin1# /sbin/chkconfig cxfs_cluster off
admin1# reboot

Note: When performing upgrades, you should not make any other configuration changes to the cluster (such as adding new nodes or filesystems) until the upgrade of all nodes is complete and the cluster is running normally.

Upgrade the server-capable administration node (admin1). See the release notes and Chapter 7, “Server-Capable Administration Node Installation”.
Return the upgraded server-capable administration node (admin1) to the cluster. Do the following:
admin1# /sbin/chkconfig cxfs_cluster on admin1# /sbin/chkconfig cxfs on admin1# /sbin/chkconfig grio2 on (if using GRIO) admin1# reboot
Note: Skip steps 7, 8, and 9 if your cluster has only two server-capable administration nodes.
For the next server-capable administration node that is an active metadata server (say admin3), move all CXFS filesystems running on it to the potential metadata server (making the it now the active metadata server for those filesystems). Do the following to force recovery:
admin3# /sbin/chkconfig grio2 off (if using GRIO) admin3# /sbin/chkconfig cxfs off admin3# /sbin/chkconfig cxfs_cluster off admin3# reboot
Upgrade the server-capable administration node (admin3). See the release notes and Chapter 7, “Server-Capable Administration Node Installation”.
Return the upgraded server-capable administration node (admin3) to the cluster. Do the following:
admin3# /sbin/chkconfig cxfs_cluster on admin3# /sbin/chkconfig cxfs on admin3# /sbin/chkconfig grio2 on (if using GRIO) admin3# reboot
If your cluster has additional server-capable administration nodes, repeat steps 7 through 9 for each remaining server-capable administration node.
Return the first CXFS filesystem to the server-capable administration node that you want to be its metadata server (making it the active metadata server, say admin1). Do the following:
1. Enable relocation on the current active metadata server (admin2) by using the cxfs_relocation_ok system tunable parameter. See “Relocation” in Chapter 1.
2. For each filesystem for which admin2 is the now the active metadata server, manually relocate the metadata services back to the original metadata server (admin1) by using the CXFS GUI or cxfs_admin. For example:
  cxfs_admin:mycluster> relocate fs1 server=admin1
3. Disable relocation. See “Relocation” in Chapter 1.
Return the next CXFS filesystem to the server-capable administration node that you want to be its metadata server (make it the active metadata server, say admin3). Repeat this step as needed for each CXFS filesystem.
Upgrade the client-only nodes. See the release notes for each platform and the CXFS 7 Client-Only Guide for SGI InfiniteStorage.

Example Upgrade Process

The following figures show an example upgrade procedure for a cluster with three server-capable administration nodes and two filesystems (/fs1 and /fs2), in which all nodes are running CXFS 6.0 at the beginning and Node2 is the potential metadata server , and the cluster is upgrading to the 6.1.

Figure 12-2. Example Rolling Upgrade Procedure (part 1)

Figure 12-3. Example Rolling Upgrade Procedure (part 2)

Figure 12-4. Example Rolling Upgrade Procedure (part 3)

CXFS Relocation Capability

Note: Relocation is disabled by default.

Relocating the metadata server for a given filesystem requires that the relocation feature is enabled on the filesystem's active metadata server. If you want to use relocation, SGI recommends that you enable the feature on all of the filesystem's potential metadata servers by resetting the cxfs_relocation_ok parameter as follows:

Enable at run time:

potentialMDS# sysctl -w fs.cxfs.cxfs_relocation_ok=1

Enable at reboot by adding the following line to /etc/modprobe.d/sgi-cxfs-xvm.conf:
options sgi-cxfs cxfs_relocation_ok=1

To disable relocation, do the following:

Disable at run time:

potentialMDS# sysctl -w fs.cxfs.cxfs_relocation_ok=0

Disable at reboot by adding the following line to /etc/modprobe.d/sgi-cxfs-xvm.conf:
options sgi-cxfs cxfs_relocation_ok=0

CXFS and Cluster Administration `service` Commands

Table 12-1 summarizes the service(8) services used for CXFS.

The commands to start, stop, restart (stop and then start), and give status (running or stopped) are as follows:

service servicename start
service servicename restart
service servicename stop
service servicename status

Table 12-1. CXFS and Cluster Administration service Services

Service	Description
`cxfs`	Controls the `clconfd` daemon (the CXFS server-capable administration node control daemon) on the local node, which in turn controls CXFS cluster services in the kernel
`cxfs_client`	Controls the `cxfs_client` daemon (the client daemon) on the local node, which in turn controls CXFS cluster services in the kernel
`cxfs_cluster`	Controls `fs2d`, `cmond`, `cad`, and `crsd` (the cluster administration daemons) on the local node
`grio2`	Controls `ggd2` (the GRIOv2 daemon) on the local node

Note: CXFS cluster services may also be stopped by other events, such as loss of membership.

For more information, see the service(8) man page and “CXFS Tools” in Chapter 1.

Using `hafence`

To query the fencing status of a node, or raise or lower the fence for a node, you will normally use commands within cxfs_admin. See “Fencing Tasks with cxfs_admin” in Chapter 11.

You can also run the following command on a server-capable administration node:

server-admin# /usr/cluster/bin/hafence -a -s switchname -u username -p password -m mask [-L vendorname]

To raise the fence for a node:

server-admin# /usr/cluster/bin/hafence -r nodename

To lower the fence for a node:

server-admin# /usr/cluster/bin/hafence -l nodename

To query switch status:

server-admin# /usr/cluster/bin/hafence -q -s switchname

Usage notes:

-a adds or changes a switch in the cluster database.
-l lowers the fence for the specified node.
-L specifies the vendor name, which loads the appropriate plug-in library for the switch. If you do not specify the vendor name, the default is brocade.
-m specifies one of the following:
- A list of ports in the switch that will never be fenced. The list has the following form, beginning with the # symbol, separating each port number with a comma, and enclosed within quotation marks:
  "#port,port,port..."
  Each port is a decimal integer in the range 0 through 1023. For example, the following indicates that port numbers 2, 4, 5, 6, 7, and 23 will never be fenced:
  -m "#2,4,5,6,7,23"
- A hexadecimal string that represents ports in the switch that will never be fenced. Ports are numbered from 0. If a given bit has a binary value of 0, the port that corresponds to that bit is eligible for fencing operations; if 1, then the port that corresponds to that bit will always be excluded from any fencing operations. For an example, see Figure 10-5.
Server-capable administration nodes automatically discover the available HBAs and, when fencing is triggered, fence off all of the SAN HBAs when the Fence or FenceReset fail policy is selected. However, masked HBAs will not be fenced. Masking allows you to prevent the fencing of devices that are attached to the SAN but are not shared with the cluster, to ensure that they remain available regardless of CXFS status. You would want to mask HBAs used for access to tape storage, or HBAs that are only ever used to access local (nonclustered) devices.
-p specifies the password for the specified username.
-q queries switch status.
-r raises the fence for the specified node.
-s specifies the hostname of the Fibre Channel, SAS, or InfiniBand switch; this is used to determine the IP address of the switch.
-u specifies the user name to use when sending a message to the switch.

For example, the following defines a QLogic switch named myqlswitch and uses no masking:

server-admin# /usr/cluster/bin/hafence -a -s myqlswitch -u admin -p *** -L qlogic

Note: Vendor plugin libraries should be installed in a directory that is in the platform-specific search path of the dynamic linker, typically the same location as the fencing library, libcrf.so. The above command line will attempt to load the libcrf_qlogic.so library.

The following masks port numbers 2 and 3:

server-admin# /usr/cluster/bin/hafence -a -s myqlswitch -u admin -p *** -m "#2,3" -L qlogic

The following lowers the fence for client1:

server-admin# /usr/cluster/bin/hafence -l client1

The following raises the fence for client1:

server-admin# /usr/cluster/bin/hafence -r client1

The following queries port status for all switches defined in the cluster database:

server-admin# /usr/cluster/bin/hafence -q

For more information, see the hafence(8) man page. See the release notes for supported switches.

Firewalls and CXFS Port Usage

The CXFS private network should be restricted to CXFS use. If there are daemons other than CXFS daemons that are running on systems attached to the private network, you should configure them so that they listen for connections on the public networks only. A firewall on the CXFS private network is not required. However, if you use a firewall, be aware that CXFS uses the ports listed in Table 12-2 and Table 12-3.

Table 12-2. Ports Used by a Client-Only Node

Port/Protocol	Description
`5449/tcp`	`cxfs_client` connects to this port on a server-capable administration node
`5449/udp`	`cxfs_client` listens to this port for `fs2d` heartbeat traffic from server-capable administration nodes
`5450/tcp`	A client-only node connects to this port on the server-capable administration nodes for kernel messages (channel 0)
`5451/tcp`	A client-only node connects to this port on the server-capable administration nodes for kernel messages (channel 1)
`5452/udp`	Previously used for kernel discovery (prior to CXFS 6.6)
`5453/udp`	A client-only node listens for and sends multicast kernel heartbeat/discovery messages using this port

Table 12-3. Ports Used by a Server-Capable Administration Node

Port/Protocol	Description
`22/tcp`	`ssh` connections to the SAN switch for fencing
`23/tcp`	`telnet` connections to the SAN switch for fencing
`111/tcp`	TCP port, for more information see the Linux `rpcbind(8)` man page
`111/udp`	UDP port, for more information see the Linux `rpcbind(8)` man page
`600-1023/tcp` (arbitrary assignment, typically in this range)	`fs2d` registers its RPC service with the `rpcbind` utility with program number `391060` and version number `1`. The `rpcbind` utility then assigns an arbitrary port, typically in the range `600` through `1023`, for TCP RPC traffic. Various daemons running on the server-capable administration nodes (such as `cad`, `crsd`, `clconfd`, and `cmond`) will connect to the assigned port.
`5435/tcp`	CXFS GUI `cad` daemon, specified as `sgi-cad` in `/etc/services` (changeable by the site)
`5449/tcp`	`cxfs_admin` connects to this port on other server-capable administration nodes and `fs2d` accepts connections to this port from client-only nodes
`5449/udp`	`cxfs_admin` listens to this port for `fs2d` heartbeat traffic from other server-capable administration nodes and `fs2d` listens to this port for `fs2d` heartbeat traffic from other server-capable administration nodes
`5450/tcp`	A server-capable administration node accepts connections from all nodes and will itself connect to other server-capable administration nodes for kernel messages (channel 0)
`5451/tcp`	A server-capable administration node accepts connections from all cluster nodes and will itself connect to other server-capable administration nodes for kernel messages (channel 1)
`5452/udp`	Previously used for kernel discovery (prior to CXFS 6.6)
`5453/udp`	A server-capable administration node listens for and sends multicast kernel heartbeat/discovery messages using this port
`7500/udp`	`crsd` daemon that handles node resets, specified as `sgi-crsd` in `/etc/services` (changeable by the site)

For example, suppose you have a cluster with two private networks and you want to move your Linux client-only nodes outside a firewall. To ensure that the fs2d and cxfs_client daemons only see port 5449 traffic on those networks, you could use the following iptables(8) commands on the Linux client-only nodes:

client-only# iptables -A INPUT --source 192.168.13.0/24 -p udp --dport 5449 -j RETURN
client-only# iptables -A INPUT --source 192.168.14.0/24 -p udp --dport 5449 -j RETURN
client-only# iptables -A INPUT -p udp --dport 5449 -j DROP

`chkconfig` Arguments

Table 12-4 summarizes the CXFS chkconfig arguments for server-capable administration nodes.

Table 12-4. chkconfig Arguments for Server-Capable Administration Nodes

Argument

Description

cxfs_cluster

Controls the cluster administration daemons (fs2d, crsd, cad, and cmond). If this argument is off, the database daemons will not be started at the next reboot and the local copy of the database will not be updated if you make changes to the cluster configuration on the other nodes. This could cause problems later, especially if a majority of nodes are not running the database daemons. If the database daemons are not running, the cluster database will not be accessible locally and the node will not be configured to join the cluster.

cxfs

Controls the clconfd daemon and whether or not the cxfs_shutdown command is used during a system shutdown. The cxfs_shutdown command attempts to withdraw from the cluster gracefully before rebooting. Otherwise, the reboot is seen as a failure and the other nodes must recover from it.

Note: clconfd cannot start unless fs2d is already running.

fam

Starts the file alteration monitoring (fam) service, which is required to use the CXFS GUI on Linux nodes.

These settings can be modified by the CXFS GUI or by the administrator. These settings only control the processes, not the cluster. Stopping the processes that control the cluster will not stop the cluster (that is, will not drop the cluster membership or lose access to CXFS filesystems and cluster volumes), and starting the processes will start the cluster only if the CXFS services are marked as activated in the database.

The following shows the settings of the arguments on server-capable administration nodes:

server-admin# chkconfig --list | grep cxfs fam
cxfs_cluster    0:off   1:off   2:on    3:on    4:on    5:on    6:off
cxfs            0:off   1:off   2:on    3:on    4:on    5:on    6:off
fam             0:off   1:off   2:on    3:on    4:on    5:on    6:off

Granting Task Execution Privileges for Users

The CXFS GUI lets you grant or revoke access to a specific GUI task for one or more specific users. By default, only root may execute tasks in the GUI. For instructions, see “Privileges Tasks with the GUI” in Chapter 10.

The cxfs_admin command lets you grant access permission to specific nodes using the access allow|deny subcommands. See “Setting cxfs_admin Access Permissions” in Chapter 11.

Transforming a Server-Capable Administration Node into a Client-Only Node

You should install a node as a server-capable administration node only if you intend to use it as a potential metadata server. All other nodes should be installed as client-only nodes. See “Make Most Nodes Client-Only” in Chapter 2.

To transform a server-capable administration node into a client-only node, do the following:

Ensure that the node is not listed in the cluster database as a potential metadata server for any filesystem.
Stop the CXFS services on the node.
Modify the cluster so that it no longer contains the node.
Delete the node definition.
Remove the packages listed in “CXFS Software Products Installed on Server-Capable Administration Nodes” in Chapter 1 from the node.
Reboot the node to ensure that all previous node configuration information is removed.
Install client-only software as documented in the CXFS 7 Client-Only Guide for SGI InfiniteStorage.
Define the node as a client-only node.
Modify the cluster so that it contains the node if you are using the GUI. (This step is handled by cxfs_admin automatically.)
Start CXFS services on the node.

For more information about these tasks, see:

CXFS Mount Scripts

On server-capable administration nodes, the following scripts are provided for execution by the clconfd daemon prior to and after a CXFS filesystem is mounted or unmounted:

/var/cluster/clconfd-scripts/cxfs-pre-mount

/var/cluster/clconfd-scripts/cxfs-post-mount

/var/cluster/clconfd-scripts/cxfs-pre-umount

/var/cluster/clconfd-scripts/cxfs-post-umount

The following script is run when needed to reprobe the storage controllers on server-capable administration nodes:

/var/cluster/clconfd-scripts/cxfs-reprobe

You can customize these scripts to suit a particular environment. For example, an application could be started when a CXFS filesystem is mounted by extending the cxfs-post-mount script. The application could be terminated by changing the cxfs-pre-umount script.

On server-capable administration nodes, these scripts also allow you to use NFS to export the CXFS filesystems listed in /etc/exports if they are successfully mounted.

The appropriate daemon executes these scripts before and after mounting or unmounting CXFS filesystems specified in the /etc/exports file. The files must be named exactly as above and must have root execute permission.

Note: The /etc/exports file describes the filesystems that are being exported to NFS clients. If a CXFS mount point is included in the exports file, the empty mount point is exported unless the filesystem is re-exported after the CXFS mount using the cxfs-post-mount script.

The following arguments are passed to the files:

cxfs-pre-mount: filesystem device name and CXFS mounting point
cxfs-post-mount: filesystem device name, CXFS mounting point, and exit code
cxfs-pre-umount: filesystem device name and CXFS mounting point
cxfs-post-umount: filesystem device name, CXFS mounting point, and exit code

Because the filesystem device name is passed to the scripts, you can write the scripts so that they take different actions for different filesystems; because the exit codes are passed to the -post- files, you can write the scripts to take different actions based on success or failure of the operation.

The clconfd or cxfs_client daemon checks the exit code for these scripts. In the case of failure (nonzero), the following occurs:

For cxfs-pre-mount and cxfs-pre-umount, the corresponding mount or unmount is not performed
For cxfs-post-mount and cxfs-post-umount, clconfd will retry the entire operation (including the -pre- script) for that operation

This implies that if you do not want a filesystem to be mounted on a host, the cxfs-pre-mount script should return a failure for that filesystem while the cxfs-post-mount script returns success.

Note: After the filesystem is unmounted, the mount point is removed.

For information about the mount scripts on client-only nodes, see the CXFS 7 Client-Only Guide for SGI InfiniteStorage.

Using DMF

DMF must make all of its DMAPI interface calls through the CXFS active metadata server. The CXFS client nodes do not provide a DMAPI interface to CXFS mounted filesystems. A CXFS client routes all of its communication to DMF through the metadata server. This generally requires that DMF run on the CXFS metadata server. If DMF is managing a CXFS filesystem, DMF will ensure that the filesystem's CXFS metadata server is the DMF server and will use metadata server relocation if necessary to achieve that configuration.

Note: DMF data-mover processes must only run on the active CXFS metadata server (the DMF server node) and any DMF parallel data-mover nodes. Do not run data-mover processes on potential metadata server nodes.

DMF requires independent paths to tape drives so that they are not fenced by CXFS. The ports for the tape drive paths on the switch should be masked from fencing in a CXFS configuration.

The SAN must be zoned so that XVM does not failover CXFS filesystem I/O to the paths visible through the tape HBA ports when port fencing occurs. Therefore, either independent switches or independent switch zones should be used for CXFS/XVM volume paths and DMF tape drive paths.

To use DMF with CXFS, do the following:

For server-capable administration nodes, install the sgi-dmapi and sgi-xfsprogs packages from the SGI InfiniteStorage Software Platform (ISSP) release. These are part of the software for the DMF server and the DMF parallel data-mover node. The DMF software will automatically enable DMAPI, which is required to use the dmi mount option.

For CXFS client-only nodes, no additional software is required other than SLES 10 and SLES 11; see CXFS 7 Client-Only Guide for SGI InfiniteStorage.
When using the Parallel Data-Mover Option, install the DMF Parallel Data Mover software package, which includes the required underlying CXFS client-only software. (From the CXFS cluster point of view, the DMF parallel data-mover node is a CXFS client-only node but one that is dedicated to DMF data-mover activities.)
Use the dmi option when mounting a filesystem to be managed.
Start DMF on the CXFS active metadata server for each filesystem to be managed.

For more information about DMF, see the DMF 6 Administrator Guide.

Discovering the Active Metadata Server

This section discusses how to discover the active metadata server using various tools:

CXFS GUI and the Active Metadata Server

To use the GUI to discover the active metadata server for a filesystem, do the following:

Select View: Filesystems
In the view area, click the name of the filesystem you wish to view. The name of the active metadata server is displayed in the details area to the right.

Figure 12-5 shows an example.

Figure 12-5. GUI Window Showing the Metadata Server

`cxfs_admin` and the Active Metadata Server

To use cxfs_admin to discover the active metadata server for a filesystem, do the following:

To show information for all filesystems, including their active metadata servers:

show server

For example:

cxfs_admin:clusterOne> show server
Event at [ Oct 22 12:33:43 ]
filesystem:zj01s0:status:server=bert
filesystem:zj01s1:status:server=cxfsxe5

To show the active metadata server for a specific filesystem:
show [filesystem:]filesystem:status:server
In the above, you could abbreviate status to *. For example, if filesystem zj01s1 is a unique name in the cluster database:
cxfs_admin:clusterOne> show zj01s1:*:server Event at [ Oct 22 12:34:43 ] filesystem:zj01s1:status:server=cxfsxe5

`clconf_info` and the Active Metadata Server

You can use the clconf_info command to discover the active metadata server for a given filesystem. For example, the following shows that bert is the metadata server for zj01s0:

# /usr/cluster/bin/clconf_info

Event at [2009-10-22 12:37:33]

Membership since Wed Oct 21 13:34:43 2009
________________ ______ ________ ______ ______
Node             NodeID Status   Age    CellID
________________ ______ ________ ______ ______
cxfsxe5               1 up           44      1
bert                  3 up           33      0
pg-27                 4 up            1      5
penguin17             5 up           54      4
cxfsxe10              6 Disabled      -      3
________________ ______ ________ ______ ______
2 CXFS FileSystems
/dev/cxvm/zj01s1 on /mnt/zj01s1  enabled  server=(cxfsxe5)  3 client(s)=(bert,penguin17,pg-27)  status=UP
/dev/cxvm/zj01s0 on /mnt/zj01s0  enabled  server=(bert)  3 client(s)=(cxfsxe5,penguin17,pg-27)  status=UP

Shutdown of the Database and CXFS

This section tells you how to perform the following:

For more information about states, Chapter 14, “Monitoring Status”. If there are problems, see Chapter 15, “Troubleshooting”.

Cluster Database Shutdown

A cluster database shutdown terminates the following user-space daemons that manage the cluster database:

cad

clconfd

cmond

crsd

fs2d

After shutting down the database on a node, access to the shared filesystems remains available and the node is still a member of the cluster, but the node is not available for database updates. Rebooting the node results in a restart of all services (restarting the daemons, joining cluster membership, enabling cluster volumes, and mounting CXFS filesystems).

To perform a cluster database shutdown, enter the following on a server-capable administration node:

server-admin# killall -TERM clconfd
server-admin# service cxfs_cluster stop

If you also want to disable the daemons from restarting at boot time, enter the following:

server-admin# /sbin/chkconfig grio2 off (If running GRIOv2)
server-admin# /sbin/chkconfig cxfs off
server-admin# /sbin/chkconfig cxfs_cluster off

For more information, see “chkconfig Arguments”.

Node Status and Cluster Database Shutdown

A cluster database shutdown is appropriate when you want to perform a maintenance operation on the node and then reboot it, returning it to ACTIVE status (as displayed by the GUI) or stable status (as displayed by cxfs_admin).

If you perform a cluster database shutdown, the node status will be DOWN in the GUI or inactive in cxfs_admin, which has the following impacts:

The node is still considered part of the cluster, but is unavailable.
The node does not get cluster database updates; however, it will be notified of all updates after it is rebooted.

Missing cluster database updates can cause problems if the kernel portion of CXFS is active. That is, if the node continues to have access to CXFS, the node's kernel level will not see the updates and will not respond to attempts by the remaining nodes to propagate these updates at the kernel level. This in turn will prevent the cluster from acting upon the configuration updates.

Note: If the cluster database is shut down on more than half of the server-capable administration nodes, changes cannot be made to the cluster database.

Restart the Cluster Database

To restart the cluster database, enter the following:

server-admin# service cxfs_cluster start
server-admin# service cxfs start

Normal CXFS Shutdown: Stop CXFS Services or Disable the Node

You should perform a normal CXFS shutdown in the GUI or disable a node in cxfs_admin when you want to stop CXFS services on a node and remove it from the CXFS kernel membership quorum.

A normal CXFS shutdown in the GUI does the following:

Unmounts all the filesystems except those for which it is the active metadata server; those filesystems for which the node is the active metadata server will become inaccessible from the node after it is shut down.
Terminates the CXFS kernel membership of this node.
Marks the node as INACTIVE in the GUI and disabled in cxfs_admin.

The effect of this is that cluster disks are unavailable and no cluster database updates will be propagated to this node. Rebooting the node leaves it in the shutdown state.

If the node on which you shut down CXFS services is an active metadata server for a filesystem, then that filesystem will be recovered by another node that is listed as one of its potential metadata servers. The server that is chosen must be a filesystem client; other filesystem clients will experience a delay during the recovery process.

If the node on which the CXFS shutdown is performed is the sole potential metadata server (that is, there are no other nodes listed as potential metadata servers for the filesystem), then you should unmount the filesystem from all nodes before performing the shutdown.

The GUI task can operate on all nodes in the cluster or on the specified node; the cxfs_admin disable command operates on just a single specified node.

To perform a normal CXFS shutdown, see

When You Should Not Perform Stop CXFS Services

You should not stop CXFS services under the following circumstances:

If CXFS services are running on the local node (the server-capable administration node on which cxfs_admin is running or the node to which the CXFS GUI is connected)
If stopping CXFS services on the node will result in loss of CXFS kernel membership quorum
If the node is the only available potential metadata server for one or more active CXFS filesystems

To achieve a CXFS shutdown under these conditions, you must perform a forced CXFS shutdown. See “Forced CXFS Shutdown: Revoke Membership of Local Node”.

Rejoining the Cluster after Stopping CXFS Services

The node will not rejoin the cluster after a reboot. The node will rejoin the cluster only after CXFS services are explicitly reactivated with the CXFS GUI or after the noded is enabled using cxfs_admin.

Forced CXFS Shutdown: Revoke Membership of Local Node

A forced CXFS shutdown (or administrative CXFS stop) is appropriate when you want to shutdown the local node even though it may drop the cluster below its CXFS kernel membership quorum requirement.

CXFS does the following:

Shuts down all CXFS filesystems on the local node. Any attempts to access the CXFS filesystems will result in an I/O error (you may need to manually unmount the filesystems).
Removes this node from the CXFS kernel membership.
Marks the node as DOWN in the GUI or inactive in cxfs_admin.
Disables access from the local node to cluster-owned XVM volumes.
Treats the stopped node as a failed node and executes the fail policy defined for the node in the cluster database. See “Fail Policies” in Chapter 2.

Caution: A forced CXFS shutdown may cause the cluster to fail if the cluster drops below CXFS kernel membership quorum.

If you do a forced CXFS shutdown on an active metadata server, it loses membership immediately. At this point, another potential metadata server must take over (and recover the filesystems) or quorum is lost and a forced CXFS shutdown follows on all nodes.

If you do a forced CXFS shutdown that forces a loss of quorum, the remaining part of the cluster (which now must also do an administrative stop) will not reset the departing node.

To perform an administrative stop, see:

Node Status and Forced CXFS Shutdown

After a forced CXFS shutdown, the node is still considered part of the configured cluster and is taken into account when propagating the cluster database and when computing the cluster database (fs2d) membership quorum. (This could cause a loss of quorum for the rest of the cluster, causing the other nodes to do a forced CXFS shutdown). The state is INACTIVE in the GUI or inactive in cxfs_admin.

It is important that this node stays accessible and keeps running the cluster infrastructure daemons to ensure database consistency. In particular, if more than half the nodes in the pool are down or not running the infrastructure daemons, cluster database updates will stop being propagated and will result in inconsistencies. To be safe, you should remove those nodes that will remain unavailable from the cluster and pool.

Rejoining the Cluster after a Forced CXFS Shutdown

After a forced CXFS shutdown, the local node will not resume CXFS kernel membership until the node is rebooted or until you explicitly allow CXFS kernel membership for the local node. See:

If you perform a forced CXFS shutdown on a server-capable administration node, you must restart CXFS on that node before it can return to the cluster. If you do this while the cluster database still shows that the node is in the cluster and is activated, the node will restart the CXFS kernel membership daemon. Therefore, you may want to do this after resetting the database or after stopping CXFS services.

Reset Capability and a Forced CXFS Shutdown

Caution: If you perform an administrative CXFS stop on a server-capable administration node with system reset capability and the stop will not cause loss of cluster quorum, the node will be reset (rebooted) by the appropriate node.

For more information about resets, see “System Reset” in Chapter 2.

Avoiding a CXFS Restart at Reboot

If the following chkconfig arguments are turned off, the clconfd and cxfs_client daemons on server-capable administration nodes and client-only nodes, respectively, will not be started at the next reboot and the kernel will not be configured to join the cluster:

Server-capable administration nodes: cxfs
Client-only nodes: cxfs_client

It is useful to turn these arguments off before rebooting if you want to temporarily remove the nodes from the cluster for system or hardware upgrades or for other maintenance work.

For example, do the following on a server-capable administration node:

server-admin# /sbin/chkconfig grio2 off (If running GRIOv2)
server-admin# /sbin/chkconfig cxfs off
server-admin# /sbin/chkconfig cxfs_cluster off
server-admin# reboot

For more information, see “chkconfig Arguments”.

Log File Management

CXFS log files should be rotated at least weekly so that your disk will not become full.

A package that provides CXFS daemons also supplies scripts to rotate the log files for those daemons via logrotate. SGI installs the following scripts on server-capable administration nodes:

/etc/logrotate.d/cluster_admin
/etc/logrotate.d/cluster_control
/etc/logrotate.d/cxfs_cluster
/etc/logrotate.d/grio2

To customize log rotation, edit these scripts.

For information about log levels, see “Configure Log Groups with the GUI” in Chapter 10.

Filesystem Maintenance

Although filesystem information is traditionally stored in /etc/fstab, the CXFS filesystems information is relevant to the entire cluster and is therefore stored in the replicated cluster database instead.

As the administrator, you will supply the CXFS filesystem configuration by using the CXFS GUI or cxfs_admin. The information is then automatically propagated consistently throughout the entire cluster. The cluster configuration daemon mounts the filesystems on each node according to this information, as soon as it becomes available.

A CXFS filesystem will be automatically mounted on all the nodes in the cluster. You can add a new CXFS filesystem to the configuration when the cluster is active.

Whenever the cluster configuration daemon detects a change in the cluster configuration, it does the equivalent of a mount -a command on all of the filesystems that are configured.

Caution: You must not modify or remove a CXFS filesystem definition while the filesystem is mounted. You must unmount it first and then mount it again after making the modifications.

This section discusses the following:

Mounting Filesystems

You supply mounting information with the CXFS GUI or cxfs_admin.

Caution: Do not attempt to use the mount command to mount a CXFS filesystem. Doing so can result in data loss and/or corruption due to inconsistent use of the filesystem from different nodes.

When properly defined and mounted, the CXFS filesystems are automatically mounted on each node by the local cluster configuration daemon, clconfd, according to the information collected in the replicated database. After the filesystems configuration has been entered in the database, no user intervention is necessary.

Mount points cannot be nested when using CXFS. That is, you cannot have a filesystem within a filesystem, such as /usr and /usr/home.

Unmounting Filesystems

To unmount CXFS filesystems, use the CXFS GUI or cxfs_admin. These tools unmount a filesystem from all nodes in the cluster. Although this action triggers an unmount on all the nodes, some might fail if the filesystem is busy. On active metadata servers, the unmount cannot succeed before all of the CXFS clients have successfully unmounted the filesystem. All nodes will retry the unmount until it succeeds, but there is no centralized report that the filesystem has been unmounted on all nodes.

To verify that the filesystem has been unmounted from all nodes, do one of the following:

Check the SYSLOG files on the metadata servers for a message indicating that the filesystem has been unmounted.
Run the CXFS GUI or cxfs_admin on the metadata server, disable the filesystem from the server, and wait until the GUI shows that the filesystem has been fully disabled. (It will be an error if it is still mounted on some CXFS clients; the GUI will show which clients are left.)

Growing Filesystems

To grow a CXFS filesystem, do the following:

Unmount the CXFS filesystem using the CXFS GUI or cxfs_admin.
Change the domain of the XVM volume from a cluster volume to a local volume using the XVM give command. See the XVM Volume Manager Administrator Guide.
Mount the filesystem as an XFS filesystem using the mount command. For more information, see the mount(8) man page.
Use the xfs_growfs command or the CXFS GUI task; see “Grow a Filesystem with the GUI” in Chapter 10.
Unmount the XFS filesystem. For more information, see the umount(8) man page.
Change the domain of the XVM volume back to a cluster volume using the give command.
Mount the filesystem as a CXFS filesystem by using the GUI or cxfs_admin

Dump and Restore

You must perform the backup of a CXFS filesystem from the active metadata server for that filesystem. The xfsdump and xfsrestore commands make use of special system calls that will only function on the active metadata server. The filesystem can have active clients during a dump process.

In a clustered environment, a CXFS filesystem may be directly accessed simultaneously by many CXFS clients and the active metadata server. A filesystem may, over time, have a number of metadata servers. Therefore, in order for xfsdump to maintain a consistent inventory, it must access the inventory for past dumps, even if this information is located on another node. SGI recommends that the inventory be made accessible by potential metadata server nodes in the cluster using one of the following methods:

Relocate the inventory to a shared filesystem. For example, where shared_filesystem is replaced with the actual name of the filesystem to be shared:
- On the server-capable administration node currently containing the inventory, enter the following:
  inventoryadmin# cd /var/lib inventoryadmin# cp -r xfsdump /shared_filesystem inventoryadmin# mv xfsdump xfsdump.bak inventoryadmin# ln -s /shared_filesystem/xfsdump xfsdump
- On all other server-capable administration nodes in the cluster, enter the following:
  otheradmin# cd /var/lib otheradmin# mv xfsdump xfsdump.bak otheradmin# ln -s /shared_filesystem/xfsdump xfsdump
Export the directory using an NFS shared filesystem. For example:
- On the server-capable administration node currently containing the inventory, add /var/lib/xfsdump to /etc/exports and then enter the following:
  inventoryadmin# exportfs -a
- On all other server-capable administration nodes in the cluster, enter the following:
  otheradmin# cd /var/lib otheradmin# mv xfsdump xfsdump.bak otheradmin# ln -s /net/hostname/var/lib/xfsdump xfsdump

Note: It is the /var/lib/xfsdump directory that should be shared, rather than the /var/lib/xfsdump/inventory directory. If there are inventories stored on various nodes, you can use xfsinvutil to merge them into a single common inventory, prior to sharing the inventory among the nodes in the cluster.

Hardware Changes and I/O Fencing

If you use I/O fencing and then make changes to your hardware configuration, you must verify that switch ports are properly enabled so that they can discover the WWPN of the HBA for I/O fencing purposes.

You must check the status of the switch ports involved whenever any of the following occur:

An HBA is replaced on a node
A new node is plugged into the switch for the first time
A SAN cable rearrangement occurs
Note: Shut down the affected nodes before rearranging cables.

To check the status, use the following command on a server-capable administration node:

server-admin# /usr/cluster/bin/hafence -v

If any of the affected ports are disabled, you must manually enable them before starting CXFS on the affected nodes. For example, for a Brocade switch:

Connect to the switch using ssh or telnet.
Use the portenable command to enable the port.
Close the ssh or telnet session.

After the port is enabled, the metadata server will be able to discover the new (or changed) WWPN of the HBA connected to that port and thus correctly update the switch configuration entries in the cluster database.

Private Network Failover

This section provides an example of modifying a cluster to provide private network failover. Do the following:

Create the failover network subnets. For example:

cxfs_admin:mycluster> create failover_net network=192.168.0.0 mask=255.255.255.0
cxfs_admin:mycluster> create failover_net network=192.168.1.0 mask=255.255.255.0

Disable all nodes (which shuts down the cluster):
cxfs_admin:mycluster> disable node:*

Update each node to include a private network. For example:

cxfs_admin:mycluster> modify red private_net=192.168.0.1,192.168.1.1
cxfs_admin:mycluster> modify yellow private_net=192.168.0.2,192.168.1.2

Enable all nodes:
cxfs_admin:mycluster> enable node:*

For more information, see Chapter 11, “cxfs_admin Command”.

Cluster Member Removal and Restoration

This section discusses removing and restoring cluster members for maintenance:

These procedures are the safest way to perform these tasks but in some cases are not the most efficient. You should follow them if you have been having problems using standard operating procedures (performing a stop/start of CXFS services or a simple host shutdown or reboot).

Manually Starting/Stopping CXFS

Note: If you are going to perform maintenance on a potential metadata server, you should first shut down CXFS services on it. Disabled nodes are not used in CXFS kernel membership calculations, so this action may prevent a loss of quorum.

On server-capable administration nodes, the service cxfs_cluster script is invoked automatically during normal system startup and shutdown procedures. (On client-only nodes, the path to the cxfs_client script varies by platform; see CXFS 7 Client-Only Guide for SGI InfiniteStorage.) This script starts and stops the processes required to run CXFS.

To start up CXFS processes manually on a server-capable administration node, enter the following commands:

server-admin# service grio2 start  (if running GRIOv2)
server-admin# service cxfs_cluster start
server-admin# service cxfs start

To stop CXFS processes manually on a server-capable administration node, enter the following commands:

server-admin# service grio2 stop  (stops GRIOv2 daemons)
server-admin# service cxfs stop (stops the CXFS server-capable administration node control daemon)
server-admin# service cxfs_cluster stop (stops the cluster administration daemons)

Note: There is also a restart option that performs a stop and then a start.

Removing a Metadata Server from the Cluster Membership

If you have a cluster with multiple active metadata servers and you must perform maintenance on one of them, you must stop CXFS services on it.

To remove an active metadata server (admin1, for example) from the cluster membership, do the following:

Enable relocation by using the cxfs_relocation_ok system tunable parameter. See “Relocation” in Chapter 1.
For each filesystem for which admin1 is the active metadata server, manually relocate the metadata services from admin1 to one of the other potential metadata servers (such as admin2) by using the CXFS GUI or cxfs_admin. For example:
cxfs_admin:mycluster> relocate fs1 server=admin2

Disable relocation. See “Relocation” in Chapter 1.

Note: If you do not perform steps 1-3 in a system reset configuration, admin1 will be reset shortly after losing its membership. The machine will also be configured to reboot automatically instead of stopping in the PROM. This means that you must watch the console and intervene manually to prevent a full reboot.

In a fencing configuration, admin1 will lose access to the SAN when it is removed from the cluster membership.

Stop CXFS services for admin1 by using the CXFS GUI or cxfs_admin running on another metadata server. For example:
cxfs_admin:mycluster> disable admin1
Shut down admin1.

If you do not want the cluster administration daemons and the CXFS control daemon to run during maintenance, execute the following commands:

admin1# /sbin/chkconfig grio2 off (if running GRIOv2)
admin1# /sbin/chkconfig cxfs off 
admin1# /sbin/chkconfig cxfs_cluster off

If you do an upgrade of the cluster software, these arguments will be automatically reset to on and the cluster administration daemons and the CXFS control daemon will be started.

For more information, see “chkconfig Arguments”.

Restoring a Server-Capable Administration Node to the Cluster Membership

To restore a server-capable administration node to the cluster, do the following:

Allow the cluster administration daemons, CXFS control daemon, and GRIOv2 daemon (if using GRIOv2) to be started upon reboot:

admin1# /sbin/chkconfig cxfs on 
admin1# /sbin/chkconfig cxfs_cluster on
admin1# /sbin/chkconfig grio2 on (if using GRIOv2)

Immediately start cluster administration daemons on the node:
exMD# service cxfs_cluster start
Immediately start the CXFS control daemon on the node:
admin1# service cxfs start
Immediately start the GRIOv2 daemon on the node (if using GRIOv2):
admin1# service grio2 start

Adding a New Server-Capable Administration Node to an Existing Cluster

To avoid problems, SGI recommends that the server-capable administration nodes all have lower cell ID numbers than any client-only nodes; see “Create an Initial Cluster of All Server-Capable Administration Nodes” in Chapter 2. The cell ID number is established by CXFS when a node is added to the cluster definition, which happens automatically with cxfs_admin when you define a node.

If you want to add a new server-capable administration node to an existing cluster, you must determine if there is an available cell ID number for it that will be lower than the cell ID of any client-only node, or reconfigure the cluster definition so that such a number becomes available.

Following is an overview of the steps required:

Use the cxfs_admin status command to determine the current cell IDs. For more information, see “cxfs_admin and Status” in Chapter 14.
Note: The GUI does not display cell ID numbers.
If a low cell ID number is available, define the new server-capable administration node with cxfs_admin (or use the corresponding GUI tasks to define the node and add it to the cluster).
If no low cell ID number is available, free the lowest cell ID that applies to a client-only node by using the cxfs_admin disable and detach commands to remove the associated client-only node from the cluster definition (or use the corresponding GUI task to remove the node from the cluster definition). For more information, see:
- “Disable a Node with cxfs_admin” in Chapter 11
- “Move a Node Between the Cluster and the Pool with cxfs_admin ” in Chapter 11
Note: Disabling a node will unmount CXFS filesystems and stop CXFS services on that node.
Define the new server-capable administration node with cxfs_admin, which automatically adds it to the cluster definition and assigns a cell ID number (or use the GUI tasks to define the node and add it to the cluster).
Restore the client-only node to the cluster definition by using the cxfs_admin attach and enable commands (or use the GUI task to add it to the cluster definition). For more information, see:
- “Move a Node Between the Cluster and the Pool with cxfs_admin ” in Chapter 11
- “Enable a Node with cxfs_admin” in Chapter 11
Use the cxfs_admin status command again to verify that the new server-capable administration node has a lower cell ID number than any client-only node.

The following sections provide detailed examples using cxfs_admin:

Note: To use the GUI, see the following sections:

Add Server: Low Cell ID Number is Available

The following example (output truncated) shows the following:

mds1 and mds2 are server-capable administration nodes (indicated by the a * character)
clientA has the lowest cell ID (3) of any client-only node
There is an available low cell ID number (2) between the current set of server-capable administration nodes (0 and 1) and the client-only nodes (3-5)

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:07 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : allocated 3 of 5
------------------  --------  --------  -------------
Node                Cell ID   Age       Status
------------------  --------  --------  -------------
mds1 *              0         4         Stable
mds2 *              1         3         Stable
clientA             3         1         Stable
clientB             4         1         Stable
clientC             5         1         Stable
...

To add a new server-capable administration node to the clusterOne cluster, do the following:

Define the server-capable administration node (in this case, mds3), which automatically adds it to the cluster definition and assigns a cell ID number:

cxfs_admin:clusterOne> create node
Specify the attributes for create node:
 name? mds3
 type? server_admin
 private_net? 192.168.0.178
Event at [ Oct 22 13:08:08 ]
Node "mds3" has been created, waiting for it to join the cluster...
Please restart all cxfs and cluster services on the server "mds3" to make it
join the cluster.

Use the status command to verify that the new server-capable administration node has a lower cell ID number than any client-only node. For example:

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:09 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : allocated 3 of 5
------------------  --------  --------  -------------
Node                Cell ID   Age       Status
------------------  --------  --------  -------------
mds1 *              0         10        Stable
mds2 *              1         9         Stable
mds3 *              2         1         Stable
clientA             3         7         Stable
clientB             4         7         Stable
clientC             5         7         Stable
...

Add Server: No Low Cell ID Number is Available

The following example shows the following:

mds1 and mds2 are server-capable administration nodes (indicated by the a * character)
clientA has the lowest cell ID (2) of any client-only node
There is not an available low cell ID number between the current set of server-capable administration nodes (0 and 1) and the client-only nodes (2-4)

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:10 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : allocated 3 of 5 
------------------  --------  --------  ---------------
Node                Cell ID   Age       Status
------------------  --------  --------  ---------------
mds1 *              0         4         Stable
mds2 *              1         3         Stable
clientA             2         1         Stable
clientB             3         1         Stable
clientC             4         1         Stable
...

Because there is no available number, you must adjust the cell IDs. Do the following:

Free the lowest cell ID that applies to a client-only node (in this case, 2 for clientA) by using the following cxfs_admin commands to remove the associated client-only node from the cluster definition:

cxfs_admin:clusterOne > disable clientA 
Event at [ Oct 22 13:08:11 ]
Node "clientA" has been disabled, waiting for it to leave the cluster...
Waiting for node clientA, current status: Disabled and unmounting
Operation completed successfully
cxfs_admin:clusterOne> detach clientA

Define the new server-capable administration node mds3, which automatically adds it to the cluster definition and assigns a cell ID number (the now-available 2):

cxfs_admin:clusterOne> create node
Specify the attributes for create node:
 name? mds3
 type? server_admin
 private_net? 192.168.0.178
Event at [ Oct 22 13:08:12 ]
Node "mds3" has been created, waiting for it to join the cluster...
Please restart all cxfs and cluster services on the server "mds3" to make it
join the cluster.

Restore the client-only node clientA to the cluster definition by attaching it and enabling it:

cxfs_admin:clusterOne > attach clientA 
Event at [ Jan 11 15:27:58 ]
Node "clientA" has been enabled, waiting for it to join the cluster...
Waiting for node clientA, current status: Establishing membership
Waiting for node clientA, current status: Probing XVM volumes
Operation completed successfully
cxfs_admin:clusterOne> enable clientA

This automatically assigns a new cell ID number to clientA (the next available number, which is 5).

Use the status command to verify that the new server-capable administration node mds3 has a lower cell ID number than any client-only node:

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:13 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : allocated 3 of 5 
------------------  --------  --------  -----------------
Node                Cell ID   Age       Status
------------------  --------  --------  -----------------
mds1 *              0         28        Stable
mds2 *              1         27        Stable
mds3 *              2         15        Stable
clientA             5         1         Stable
clientB             3         25        Stable
clientC             4         25        Stable
...

Adjusting the Cell ID Numbers

If your cluster is experiencing problems, you should examine the current cell IDs and readjust as necessary. Following is an overview of the steps required:

Use the cxfs_admin status command to determine the current cell IDs. For more information, see “cxfs_admin and Status” in Chapter 14.
Note: The GUI does not display cell ID numbers.
If a low cell ID number is available, assign it to the server-capable administration node that has an inappropriate cell ID by doing the following:
1. Remove the inappropriate cell ID from the server-capable administration node by using the cxfs_admin disable and detach commands, which removes the node from the cluster definition and places it in the poolnode class (or use the corresponding GUI task to remove the node from the cluster definition, leaving it in the pool). For more information, see:
  - “Disable a Node with cxfs_admin” in Chapter 11
  - “Move a Node Between the Cluster and the Pool with cxfs_admin ” in Chapter 11
2. Assign a new cell ID (which will be the free low cell ID number) to the node by using the cxfs_admin attach and enable commands, which add the node back in to the cluster definition (or use the corresponding GUI task to add the node from the cluster definition).
If a low cell ID number is not available, adjust the cell IDs by doing the following:
1. Remove the inappropriate cell ID from the server-capable administration node by using the cxfs_admin disable and detach commands, which removes the node from the cluster definition and places it in the poolnode class (or use the corresponding GUI task to remove the node from the cluster definition, leaving it in the pool). For more information, see:
  - “Disable a Node with cxfs_admin” in Chapter 11
  - “Move a Node Between the Cluster and the Pool with cxfs_admin ” in Chapter 11
2. Free the lowest cell ID that applies to a client-only node by using the cxfs_admin disable and detach commands to remove the associated client-only node from the cluster definition (or use the corresponding GUI task to remove the node from the cluster definition).
3. Assign a new cell ID (which will be the now-free low cell ID number) to the server-capable administration node by using the cxfs_admin attach and enable commands, which add the node back in to the cluster definition (or use the corresponding GUI task to add the node from the cluster definition). For more information, see:
  - “Move a Node Between the Cluster and the Pool with cxfs_admin ” in Chapter 11
  - “Enable a Node with cxfs_admin” in Chapter 11
4. Restore the client-only node to the cluster definition by using the cxfs_admin attach and enable commands (or use the GUI task to add it to the cluster definition). It will automatically be assigned a new, higher, cell ID number.
Use the cxfs_admin status command again to verify that the server-capable administration node has a lower cell ID number than any client-only node.

The following sections provide detailed examples using cxfs_admin:

Note: To use the GUI, see the following sections:

Adjust Cell ID: Low Number is Available

The following example (output truncated) shows the following:

mds1 and mds2 are server-capable administration nodes (indicated by the a * character)
mds2 has a higher cell ID number (3) than a client-only node (clientA with number 2)
clientA has the lowest cell ID (2) of any client-only node
There is an available low number (1)

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:14 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : allocated 3 of 5
------------------  --------  --------  --------------
Node                Cell ID   Age       Status
------------------  --------  --------  --------------
mds1 *              0         4         Stable
mds2 *              3         1         Stable
clientA             2         2         Stable
clientB             4         1         Stable
clientC             5         1         Stable
...

To adjust the inappropriate cell ID for mds2, do the following:

Remove the inappropriate cell ID from the mds2 node by using the following commands to remove mds2 from the cluster definition:

cxfs_admin:clusterOne > disable mds2 
Event at [ Jan 11 15:29:38 ]
Node "mds2" has been disabled, waiting for it to leave the cluster...
Waiting for node mds2, current status: Disabled and unmounting
Operation completed successfully
cxfs_admin:clusterOne> detach mds2

Assign a new cell ID (which will be the free number 1) to mds2 by using the following commands to add the node back into the cluster definition:

cxfs_admin:clusterOne > attach mds2 
Event at [ Jan 11 15:27:58 ]
Node "mds2" has been enabled, waiting for it to join the cluster...
Waiting for node mds2, current status: Establishing membership
Waiting for node mds2, current status: Probing XVM volumes
Operation completed successfully
cxfs_admin:clusterOne> enable mds2

Use the status command to verify that mds2 has a lower cell ID number than any client-only node:

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:15 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : 3 of 5
------------------  --------  --------  -------------
Node                Cell ID   Age       Status
------------------  --------  --------  -------------
mds1 *              0         22        Stable
mds2 *              1         1         Stable
clientA             2         20        Stable
clientB             4         19        Stable
clientC             5         19        Stable
...

Adjust Cell ID: No Low Number is Available

The following example (output truncated) shows the following:

mds1 and mds2 are server-capable administration nodes (indicated by the a * character)
mds2 has a higher cell ID number (3) than a client-only node (clientA with number 1 and clientB with number 2)
clientA has the lowest cell ID (1) of any client-only node
There is no available lower number

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:16 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : allocated 3 of 5
------------------  --------  --------  -------------
Node                Cell ID   Age       Status
------------------  --------  --------  -------------
mds1 *              0         4         Stable
mds2 *              3         3         Stable
clientA             1         1         Stable
clientB             2         1         Stable
clientC             4         1         Stable
...

Because there is no available number, you must readjust the cell IDs. Do the following:

Remove the inappropriate cell ID from server-capable administration node mds2 by removing it from the cluster definition:

cxfs_admin:clusterOne > disable mds2 
Event at [ Oct 22 13:08:17 ]
Node "clientA" has been disabled, waiting for it to leave the cluster...
Waiting for node clientA, current status: Disabled and unmounting
Operation completed successfully
cxfs_admin:clusterOne> detach mds2

Free the lowest cell ID (1) by removing the associated client-only node (clientA) from the cluster definition:

cxfs_admin:clusterOne > disable clientA 
Event at [ Oct 22 13:08:18 ]
Node "clientA" has been disabled, waiting for it to leave the cluster...
Waiting for node clientA, current status: Disabled and unmounting
Operation completed successfully
cxfs_admin:clusterOne> detach clientA

Restore mds2 to the cluster definition (which automatically assigns it the now-free cell ID of 1):

cxfs_admin:clusterOne > attach mds2 
Event at [ Oct 22 13:08:19 ]
Node "mds2" has been enabled, waiting for it to join the cluster...
Waiting for node mds2, current status: Establishing membership
Waiting for node mds2, current status: Probing XVM volumes
Operation completed successfully
cxfs_admin:clusterOne> enable mds2

Restore clientA to the cluster definition:

cxfs_admin:clusterOne > attach clientA 
Event at [ Oct 22 13:08:20 ]
Node "clientA" has been enabled, waiting for it to join the cluster...
Waiting for node clientA, current status: Establishing membership
Waiting for node clientA, current status: Probing XVM volumes
Operation completed successfully
cxfs_admin:clusterOne> enable clientA

Use the status command to verify that the new server-capable administration node has a lower cell ID number than any client-only node. For example:

cxfs_admin:clusterOne > status
Event at [ Oct 22 13:08:21 ]
Cluster         : clusterOne
Tiebreaker      : clientA
Client Licenses : allocated 3 of 5
------------------  --------  --------  -------------
Node                Cell ID   Age       Status
------------------  --------  --------  -------------
mds1 *              0         37        Stable
mds2 *              1         11        Stable
clientA             3         1         Stable
clientB             2         34        Stable
clientC             4         34        Stable
...

Removing a Single Client-Only Node from the Cluster

To remove a single client-only node from the cluster, do the following:

Verify that the configuration is consistent among active metadata servers in the cluster by running the following on each active metadata server and comparing the output:
MDS# /usr/cluster/bin/clconf_info
If the client is not consistent with the metadata servers, or if the metadata servers are not consistent, then you should abort this procedure and address the health of the cluster. If a client is removed while the cluster is unstable, attempts to get the client to rejoin the cluster are likely to fail. For this reason, you should make sure that the cluster is stable before removing a client.
Flush the system buffers on the client you want to remove in order to minimize the amount of buffered information that may be lost:
client# sync

Stop CXFS services on the client. For example:

client# service cxfs_client stop
client# chkconfig cxfs_client off

Verify that CXFS services have stopped:
- Verify that the CXFS client daemon is not running on the client (success means no output):
  client# ps -ef | grep cxfs_client client#
- Monitor the cxfs_client log on the client you wish to remove and look for filesystems that are unmounting successfully. For example:
  Apr 18 13:00:06 cxfs_client: cis_setup_fses Unmounted green0: green0 from /cxfs/green0
- Monitor the SYSLOG on the active metadata server and look for membership delivery messages that do not contain the removed client. For example, the following message indicates that cell 2 (client), the node being shut down, is not included in the membership:
  Apr 18 13:01:03 5A:o200a unix: NOTICE: Cell 2 (client) left the membership Apr 18 13:01:03 5A:o200a unix: NOTICE: Membership delivered for cells 0x3 Apr 18 13:01:03 5A:o200a unix: Cell(age): 0(7) 1(5)
- Use the following command to show that filesystems are not mounted:
  client# df -hl
Verify that the configuration is consistent and does not contain the removed client by running the following on each active metadata server and comparing the output:
mds# /usr/cluster/bin/clconf_info

Restoring a Single Client-Only Node to the Cluster

To restore a single client-only node to the cluster, do the following:

Verify that the configuration is consistent among active metadata servers in the cluster by running the following on each active metadata server and comparing the output:
MDS# /usr/cluster/bin/clconf_info
If the client is not consistent with the metadata servers, or if the metadata servers are not consistent, then you should abort this procedure and address the health of the cluster. If a client is removed while the cluster is unstable, attempts to get the client to rejoin the cluster are likely to fail. For this reason, you should make sure that the cluster is stable before removing a client.
Start CXFS on the client-only node:
client# chkconfig cxfs_client on client# service cxfs_client start
Note: The path to cxfs_client varies across the operating systems supported by CXFS. For more information, see CXFS 7 Client-Only Guide for SGI InfiniteStorage.

Verify that CXFS has started:

Verify that the CXFS client daemon is running on the client-only node:

client# ps -ef | grep cxfs_client
    root        716          1  0 12:59:14 ?       0:05 /usr/cluster/bin/cxfs_client

Monitor the SYSLOG on the active metadata server and look for a cell discovery message for the client and a membership delivered message containing the client cell. For example (line breaks added for readability):

Apr 18 13:07:21 4A:o200a unix: WARNING: Discovered cell 2 (woody) 
 [priority 1 at 128.162.240.41 via 128.162.240.34]
Apr 18 13:07:31 5A:o200a unix: NOTICE: Cell 2 (client) joined the membership
Apr 18 13:07:31 5A:o200a unix: NOTICE: Membership delivered for cells 0x7
Apr 18 13:07:31 5A:o200a unix: Cell(age): 0(9) 1(7) 2(1)

Monitor the cxfs_client log on the client you restored and look for filesystem mounts that are processing successfully. For example:
Apr 18 13:06:56 cxfs_client: cis_setup_fses Mounted green0: green0 on /cxfs/green0
Use the following command to show that filesystems are mounted:
client# df -hl

Verify that the configuration is consistent and contains the client by running the following on each active metadata server and comparing the output:
MDS# /usr/cluster/bin/clconf_info

Stopping CXFS for the Entire Cluster

To stop CXFS for the entire cluster, do the following:

Stop CXFS services on a client-only node:
client# service cxfs_client stop
Repeat this step on each client-only node.
(If running GRIOv2)Stop GRIOv2 services on a server-capable administration node:
server-admin# service grio2 stop
Repeat this step on each server-capable administration node.
Stop CXFS services on a server-capable administration node:
server-admin# service cxfs stop
Repeat this step on each server-capable administration node.
Stop the cluster daemons on a server-capable administration node:
server-admin# service cxfs_cluster stop
Repeat this step on each server-capable administration node.

Restarting the Entire Cluster

To restart the entire cluster, do the following:

Start the cluster daemons on a server-capable administration node:
server-admin# service cxfs_cluster start
Repeat this step on each server-capable administration node.
Start CXFS services on a server-capable administration node:
server-admin# service cxfs start
Repeat this step on each server-capable administration node.
(If running GRIOv2)Start GRIOv2 services on a each server-capable administration node:
server-admin# service grio2 start
Repeat this on each server-capable administration node.
Start CXFS services on a client-only node:
client# service cxfs_client start
Repeat this step on each client-only node.

XVM Volume Mapping to Storage Targets

The cxfs-enumerate-wwns script enumerates the worldwide names (WWNs) on the host that are known to CXFS. You can use the cxfs-enumerate-wwns script on a server-capable administration node to map XVM volumes to storage targets:

server-admin# /var/cluster/clconfd-scripts/cxfs-enumerate-wwns | grep -v "#"| sort -u

Generation of Streaming Workload for Video Streams

To generate streaming workload for SD/HD/2K/4K formats of video streams, you can use the frametest(1) command. Each frame is stored in a separate file. You can also use this tool to simulate the reading and writing video streams by streaming applications. The tool also generates the performance statistics for the reading and writing operation, so it can be very useful for performance analysis for streaming applications.

For example, to do a multithreaded (4 threads) write test of 20,000 HD frames, as fast as possible (the dir directory should contain 20,000 HD frames created by a previous write test) on Linux:

# frametest -t4 -w hd -n20000 -x frametest_w_t4_hd_20000_flatout.csv dir

To use 24 frames per second using a buffer of 24 frames:

# frametest -t4 -n20000 -f24 -q24 -g frametest_r_t4_hd_20000_24fps_24buf.csv dir

For details about frametest and its command-line options, see the frametest(1)

Frame Files Defragmentation and Analysis

The framesort utility provides easy file-layout analysis and advanced file-sequence reorganization:

File-layout analysis shows the following:

How well the specified files are allocated
How many same-sized files are interleaved
The number of runs where files are allocated in consecutive order or in reverse consecutive order

File-sequence reorganization makes files with consecutive filenames be placed consecutively in storage. It can also align files to their stripe-unit boundary. After rearrangement, files can can gain higher retrieval bandwidth, which is essential for frame playback.

For example, the following Linux command line will do analysis and rearrangement recursively starting from directory movie1. It also displays the progress status and verbose information. If the percentage of poorly organized files is equal to or greater than 15%, the rearrangement is triggered:

# framesort -rdgvva 15 movie1

For details about command-line arguments, see the framesort(1) man page.

Disk Layout Optimization for Approved Media Customers

This section discusses the following:

Ideal Frame Layout

An ideal frame layout is one in which frames for each stream are written sequentially on disk to maximize bandwidth and minimize latency:

Minimize seek times while reading and writing
Maximize RAID prefetch into cache for reads
Maximize RAID coalescing writes into larger writes to each disk

Figure 12-6 shows an ideal frame layout.

Figure 12-6. Ideal Frame Layout

With multithreaded applications (such as frametest), there will be multiple requests in flight simultaneously. As each frame is requested, data from upcoming frames will be prefetched into cache. Figure 12-7 shows an example of a 4-thread frametest read (2-MB stripe unit / 1-GB cache size/ prefetch = x1 / 16 slices).

Figure 12-7. Ideal Frame Layout with RAID Prefetch

Multiple Streams of Real-Time Applications

When there are multiple streams of real-time applications, frames from each stream are interleaved into the same region. Frames are not written sequentially but will jump forwards and backwards in the filesystem. The RAID is unable to support many real-time streams and is unable to maintain frame rates due to additional back-end I/O. Filesystems allocate files based on algorithms to utilize free space, not to maximize RAID performance when reading streams back.

Figure 12-8 shows an example of multiple streams.Figure 12-9 shows an example of poor cache utilization.

Figure 12-8. Multiple Streams of Real-Time Applications

Figure 12-9. Poor Cache Utilization

The `filestreams` Mount Option

Approved media customers can use the XFS filestreams mount option with CXFS to maximize the ability of storage to support multiple real-time streams of video data. It is appropriate for workloads that generate many files that are created and accessed in a sequential order in one directory.

Caution: SGI must validate that your RAID model and RAID configuration can support the use of the filestreams mount option to achieve real-time data transfer and that your application is appropriate for its use. Use of this feature is complex and is reserved for designs that have been approved by SGI.

The filestreams mount option changes the behavior of the XFS allocator in order to optimize disk layout. It selects an XFS disk block allocation strategy that does the following:

Identifies streams writing into the same directory and locks down a region of the filesystem for that stream, which prevents multiple streams from using the same allocation groups
Allocates the file data sequentially on disk in the order that the files are created, space permitting
Uses different regions of the filesystem for files in different directories

Using the filestreams mount option can improve both bandwidth and latency when accessing the files because the RAID will be able to access the data in each directory sequentially. Therefore, multiple writers may be able to write into the same filesystem without interleaving file data on disk. Filesystem can be filled up to approximately 94% before performance degrades. Deletion of projects does not fragment a filesystem, therefore there is no need to rebuild a filesystem after each project.

You can safely enable the filestreams mount option on an existing filesystem and later disable it without affecting compatibility. (The mount option affects where data is located in the filesystem; it does not change the format of the filesystem.) However, you may not get the full benefit of filestreams due to preexisting filesystem fragmentation.

Figure 12-10 shows an example of excellent cache utilization that allows for more streams.

Figure 12-10. Excellent Cache Utilization

For more information, contact SGI Support.

Creating a Case-Insensitive CXFS Filesystem

CXFS has limited support for case-insensitive filesystems:

In ASCII filenames, lowercase and uppercase are treated as equal. This means the filesystem treats names that differ only in case as equivalent.

Note: It is not possible to rename a file to a name that only differs in case. For example, the following will not work:

# mv /cxfs/tp91/tmp/TST /cxfs/tp91/tmp/tst
mv: `/cxfs/tp91/tmp/TST' and `/cxfs/tp91/tmp/tst' are the same file

The filesystem is case-preserving. This means the filesystem remembers the exact name that was used to create the file. This means that if a file was created with the name "File", it can be referenced using the name "FILE" or "file", but the reported name will always be "File".
Case-insensitive CXFS filesystems are not supported on SLES 10 and RHEL client-only nodes. These nodes will fail to mount the filesystem with messages such as the following:
Preparing to mount CXFS file system "/dev/cxvm/tp91" XFS: bad version XFS: SB validate failed
Note: Nodes that use enhanced XFS support case-insensitive filesystems.

Note: Be aware that some applications rely on the case of filenames and will be confused when used with a case-insensitive filesystem.

Whether a CXFS filesystem is case-insensitive is determined when the filesystem is created. To create a case-insensitive filesystem, provide the following option to mkfs.xfs:

-n version=ci

For example:

# mkfs.xfs -n version=ci /dev/cxvm/tp91
meta-data=/dev/cxvm/tp91         isize=256    agcount=16, agsize=2746480 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=43943680, imaxpct=25
         =                       sunit=16     swidth=32 blks
naming   =version 2              bsize=4096   ascii-ci=1
log      =internal log           bsize=4096   blocks=21472, version=2
         =                       sectsz=512   sunit=16 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

Prev	Table of Contents	Next
Chapter 11. cxfs_admin Command		Chapter 13. Cluster Database Management