Chapter 10. Cluster Configuration

This chapter provides an overview of the procedures to add the client-only nodes to an established cluster. It assumes that you already have a cluster of server-capable administration nodes installed and running with mounted filesystems. These procedures will be performed by you or by SGI service personnel.

All CXFS administrative tasks other than restarting the Windows node must be performed using the CXFS GUI (invoked by the cxfsmgr command and connected to a server-capable administration node) or the cxfs_admin command on any host that has access permission to the cluster. The GUI and cxfs_admin provide a guided configuration and setup help for defining a cluster.

This section discusses the following tasks in cluster configuration:

For detailed configuration instructions, see the CXFS Administration Guide for SGI InfiniteStorage.

Defining the Client-Only Nodes

To add a client-only node to a CXFS cluster, you must define it as a node in the pool.

Do the following to determine the value for the hostname field in the GUI:

  • AIX: use the value displayed by /usr/bin/hostname

  • Linux: use the value displayed by /bin/hostname

  • Mac OS X: use the value displayed by /bin/hostname

  • SGI ProPack: use the value displayed by /bin/hostname

  • Solaris: use the value displayed by /bin/hostname

  • Windows: select the following:

    Start -> Settings -> Network and Dial-up Connections ->  Advanced -> Network Identification

When you specify that a node is running an operating system other Linux, the node will automatically be defined as a client-only node and you cannot change it. (These nodes cannot be potential metadata servers and are not counted when calculating the CXFS kernel membership quorum.) For client-only nodes, you must specify a unique node ID.

For example, the following shows the entries used to define a Solaris node named solaris1 in the mycluster cluster:

# /usr/cluster/bin/cxfs_admin -i mycluster
cxfs_admin:mycluster> create node name=solaris1 os=solaris private_net=192.168.0.178
Node "solaris1" has been created, waiting for it to join the cluster...
Waiting for node solaris1, current status: Inactive
Waiting for node solaris1, current status: Establishing membership
Waiting for node solaris1, current status: Probing XVM volumes
Operation completed successfully

Or, in prompting mode:

# /usr/cluster/bin/cxfs_admin -i mycluster
cxfs_admin:mycluster> create node
  Specify the attributes for create node:
   name? solaris1
   os? solaris
   private_net? 192.168.0.178
Node "solaris1" has been created, waiting for it to join the cluster...
Waiting for node solaris1, current status: Inactive
Waiting for node solaris1, current status: Establishing membership
Waiting for node solaris1, current status: Probing XVM volumes
Operation completed successfully   

When you specify that a node is running an operating system other Linux, the node will automatically be defined as a client-only node and you cannot change it. (These nodes cannot be potential metadata servers and are not counted when calculating the CXFS kernel membership quorum.) For client-only nodes, you must specify a unique node ID if you use the GUI; cxfs_admin provides a default node ID.

The following shows a cxfs_admin example in basic mode:

cxfs_admin:mycluster> create node
Specify the attributes for create node:
 name? cxfsopus5
 os? Linux
 private_net? 10.11.20.5
 type? client_only
Node "cxfsopus5" has been created, waiting for it to join the cluster...

For details about these commands, see the CXFS Administration Guide for SGI InfiniteStorage.

Adding the Client-Only Nodes to the Cluster (GUI)

If you are using the GUI, you must add the defined nodes to the cluster. This happens by default if you are using cxfs_admin .

After you define all of the client-only nodes, you must add them to the cluster.

Depending upon your filesystem configuration, you may also need to add the node to the list of clients that have access to the volume. See “Mounting Filesystems on the Client-Only Nodes”.

Defining the Switch for I/O Fencing

You are required to use I/O fencing on client-only nodes in order to protect data integrity. I/O fencing requires a switch; see the release notes for supported switches.

For example, for a QLogic switch named myswitch:

cxfs_admin:mycluster> create switch name=myswitch vendor=qlogic

After you have defined the switch, you must ensure that all of the switch ports that are connected to the cluster nodes are enabled. To determine port status, enter the following on a server-capable administration node:

admin# hafence -v

If there are disabled ports that are connected to cluster nodes, you must enable them. Log into the switch as user admin and use the following command:

switch# portEnable portnumber

You must then update the switch port information

For example, suppose that you have a cluster with port 0 connected to the node blue, port 1 connected to the node green, and port 5 connected to the node yellow, all of which are defined in cluster colors. The following output shows that the status of port 0 and port 1 is disabled and that the host is UNKNOWN (as opposed to port 5, which has a status of enabled and a host of yellow). Ports 2, 3, 4, 6, and 7 are not connected to nodes in the cluster and therefore their status does not matter.

admin# hafence -v
  Switch[0] "ptg-brocade" has 8 ports
    Port 0 type=FABRIC status=disabled hba=0000000000000000 on host UNKNOWN
    Port 1 type=FABRIC status=disabled hba=0000000000000000 on host UNKNOWN
    Port 2 type=FABRIC status=enabled  hba=210000e08b05fecf on host UNKNOWN
    Port 3 type=FABRIC status=enabled  hba=210000e08b01fec5 on host UNKNOWN
    Port 4 type=FABRIC status=enabled  hba=210000e08b01fec3 on host UNKNOWN
    Port 5 type=FABRIC status=enabled  hba=210000e08b019ef0 on host yellow
    Port 6 type=FABRIC status=enabled  hba=210000e08b0113ce on host UNKNOWN
    Port 7 type=FABRIC status=enabled  hba=210000e08b027795 on host UNKNOWN

In this case, you would need to enable ports 0 and 1:

Logged in to the switch:
switch# portEnable 0
switch# portEnable 1

Logged in to a server-capable administration node: 
admin# hafence -v
  Switch[0] "ptg-brocade" has 8 ports
    Port 0 type=FABRIC status=disabled hba=210000e08b0103b8 on host UNKNOWN
    Port 1 type=FABRIC status=disabled hba=210000e08b0102c6 on host UNKNOWN
    Port 2 type=FABRIC status=enabled  hba=210000e08b05fecf on host UNKNOWN
    Port 3 type=FABRIC status=enabled  hba=210000e08b01fec5 on host UNKNOWN
    Port 4 type=FABRIC status=enabled  hba=210000e08b01fec3 on host UNKNOWN
    Port 5 type=FABRIC status=enabled  hba=210000e08b019ef0 on host yellow
    Port 6 type=FABRIC status=enabled  hba=210000e08b0113ce on host UNKNOWN
    Port 7 type=FABRIC status=enabled  hba=210000e08b027795 on host UNKNOWN

admin# hafence -v 
  Switch[0] "ptg-brocade" has 8 ports
    Port 0 type=FABRIC status=disabled hba=210000e08b0103b8 on host blue
    Port 1 type=FABRIC status=disabled hba=210000e08b0102c6 on host green
    Port 2 type=FABRIC status=enabled  hba=210000e08b05fecf on host UNKNOWN
    Port 3 type=FABRIC status=enabled  hba=210000e08b01fec5 on host UNKNOWN
    Port 4 type=FABRIC status=enabled  hba=210000e08b01fec3 on host UNKNOWN
    Port 5 type=FABRIC status=enabled  hba=210000e08b019ef0 on host yellow
    Port 6 type=FABRIC status=enabled  hba=210000e08b0113ce on host UNKNOWN
    Port 7 type=FABRIC status=enabled  hba=210000e08b027795 on host UNKNOWN

Starting CXFS Services on the Client-Only Nodes (GUI)

After adding the client-only nodes to the cluster with the GUI, you must start CXFS services for them, which enables the node by setting a flag for the node in the cluster database. This happens by default with cxfs_admin.

Verifying LUN Masking

You should verify that the HBA has logical unit (LUN) masking configured such that the LUNs are visible to all the nodes in the cluster after you connect the HBA to the switch and before configuring the filesystems with XVM. For more information, see the RAID documentation.

Mounting Filesystems on the Client-Only Nodes

If you have specified that the filesystems are to be automatically mounted on any newly added nodes (such as setting mount_new_nodes=true for a filesystem in cxfs_admin), you do not need to specifically mount the filesystems on the new client-only nodes that you added to the cluster.

If you have specified that filesystems will not be automatically mounted (for example, by setting the advanced-mode mount_new_nodes=false for a filesystem in cxfs_admin ), you can do the following to mount the new filesystem:

  • With cxfs_admin, use the following command to mount the specified filesystem:

    mount filesystemname nodes=nodename

    For example:

    cxfs_admin:mycluster> mount fs1 nodes=solaris2

    You can leave mount_new_nodes=false. You do not have to unmount the entire filesystem.

  • With the GUI, you can mount the filesystems on the new client-only nodes by unmounting the currently active filesystems, enabling the mount on the required nodes, and then performing the actual mount.


Note: SGI recommends that you enable the forced unmount feature for CXFS filesystems, which is turned off by default; see “Enable Forced Unmount” in Chapter 2 and “Forced Unmount of CXFS Filesystems”.


Unmounting Filesystems

You can unmount a filesystem from all nodes in the cluster or from just the node you specify.

For example, to unmount the filesystem fs1 from all nodes:

cxfs_admin:mycluster> unmount fs1

To unmount the filesystem only from the node mynode:

cxfs_admin:mycluster> unmount fs1 nodes=mynode

Forced Unmount of CXFS Filesystems

Normally, an unmount operation will fail if any process has an open file on the filesystem. However, a forced unmount allows the unmount to proceed regardless of whether the filesystem is still in use.

For example:

cxfs_admin:mycluster> create filesystem name=myfs forced_unmount=true

Using the CXFS GUI, define or modify the filesystem to unmount with force and then unmount the filesystem.

For details, see the “CXFS Filesystems Tasks with the GUI” sections of the GUI chapter in the CXFS Administration Guide for SGI InfiniteStorage.

Restarting the Windows Node

After completing the steps in “Postinstallation Steps for Windows” in Chapter 9 and this chapter, you should restart the Windows node. This will automatically start the driver and the CXFS Client service.

When you log into the node after restarting it, Windows Explorer will list the CXFS drive letter, which will contain the CXFS filesystems configured for this node.

Verifying the Cluster Configuration

To verify that the client-only nodes have been properly added to the cluster, run the cxfs-config command on the metadata server. For example:

/usr/cluster/bin/cxfs-config -all -check

This command will dump the current cluster nodes, private network configuration, filesystems, XVM volumes, failover hierarchy, and switches. It will check the configuration and report any common errors. You should rectify these error before starting CXFS services.

Verifying Connectivity in a Multicast Environment

To verify general connectivity in a multicast environment, you can execute a UNIX ping command on the 224.0.0.1 IP address.

To verify the CXFS heartbeat, use the 224.0.0.250 IP address. The 224.0.0.250 address is the default CXFS heartbeat multicast address (because it is the default, this address does not have to appear in the /etc/hosts file).


Note: A node is capable of responding only when the administration daemons (fs2d, cmond, cad, and crsd) or the cxfs_client daemon is running.


For example, to see the response for two packets sent from Solaris IP address 128.162.240.27 to the multicast address for CXFS heartbeat and ignore loopback, enter the following:

solaris# ping -i 128.162.240.27 -s -L 224.0.0.250 2

To override the default address, you can use the -c and -m options or make the name cluster_mcast resolvable on all nodes (such as in the /etc/hosts file). For more information, see the cxfs_client man page.

Verifying the Cluster Status

To verify that the client-only nodes have been properly added to the cluster and that filesystems have been mounted, use the view area of the CXFS GUI, the cxfs_admin status command, or the clconf_info command (on a server-capable administration node) and the cxfs_info command (on a client-only node).

For example, using cxfs_admin:

cxfs_admin:mycluster> status
Cluster    : mycluster
Tiebreaker : irix-client
Licenses   : enterprise   allocated 12 of 278
             workstation  allocated 4 of 15
------------------  --------  -------------------------------------------------
Node                Cell ID   Status
------------------  --------  -------------------------------------------------
mds1 *              6         Stable
mds2 *              0         Stable
aix-client          4         Stable 
irix-client         1         Stable  
mac-client          3         Inactive
solaris-client      2         Stable  
windows-client      5         Stable  

------------------  ------------------- --------------------------------------
Filesystem          Mount Point         Status
------------------  ------------------- --------------------------------------
concatfs            /mnt/concatfs       Mounted (mds1)
mirrorfs            /mnt/mirrorfs       Mounted (mds1)
stripefs            /mnt/stripefs       Mounted (mds1)

------------------  ----------  -----------------------------------------------
Switch              Port Count  Known Fenced Ports
------------------  ----------  -----------------------------------------------
fcswitch12          32          None
fcswitch13          32          None

The following example for a different cluster shows clconf_info output:

admin# /usr/cluster/bin/clconf_info
Event at [2004-05-04 19:00:33]

Membership since Tue May  4 19:00:33 2004
____________ ______ ________ ______ ______
Node         NodeID Status   Age    CellID
____________ ______ ________ ______ ______
cxfs4             1 up           27      2
cxfs5             2 up           26      1
cxfs6             3 up           27      0
cxfswin4          5 up            1      5
cxfssun3          6 up            0      6
cxfsmac3.local.  17 up            0      7
____________ ______ ________ ______ ______
2 CXFS FileSystems
/dev/cxvm/vol0 on /mnt/vol0  enabled  server=(cxfs4)  5 
client(s)=(cxfs6,cxfs5,cxfswin4,cxfssun3,cxfsmac3.local.)  status=UP
/dev/cxvm/vol1 on /mnt/vol1  enabled  server=(cxfs5)  5 
client(s)=(cxfs6,cxfs4,cxfswin4,cxfssun3,cxfsmac3.local.)  status=UP

On client-only nodes, the cxfs_info command serves a similar purpose. The command path is as follows:

  • AIX and Solaris: /usr/cxfs_cluster/bin/cxfs_info

  • IRIX, Linux, and SGI ProPack: /usr/cluster/bin/cxfs_info

  • Mac OS X: /usr/cluster/bin/cxfs_info

  • Windows: %ProgramFiles%\CXFS\cxfs_info.exe

On AIX, Linux, Mac OS X, SGI ProPack, and Solaris nodes, you can use the -e option to wait for events, which keeps the command running until you kill the process and the -c option to clear the screen between updates.

For example, on a Solaris node:

solaris# /usr/cxfs_cluster/bin/cxfs_info
cxfs_client status [timestamp Jun 03 03:48:07 / generation 82342]

CXFS client:
     state: reconfigure (2), cms: up, xvm: up, fs: up
Cluster:
     performance (123) - enabled
Local:
     cxfssun3 (9) - enabled
Nodes:
     cxfs4  enabled  up    2
     cxfs5  enabled  up    1
     cxfs6  enabled  up    0
     cxfswin4  enabled  up    5
     cxfssun3  enabled  up    6
     cxfsmac3.local.  enabled  up    7
Filesystems:
     vol0       enabled  mounted          vol0                 /mnt/vol0
     vol1       enabled  mounted          vol1                 /mnt/vol1

The CXFS client line shows the state of the client in the cluster, which can be one of the following states:

bootstrap

Initial state after starting cxfs_client, while listening for bootstrap packets from the cluster.

connect

Connecting to the CXFS metadata server.

query

The client is downloading the cluster database from the metadata server.

reconfigure

The cluster database has changed, so the client is reconfiguring itself to match the cluster database.

stable

The client has been configured according to what is in the cluster database.

stuck

The client is unable to proceed, usually due to a configuration error. Because the problem may be transient, the client periodically reevaluates the situation. The number in parenthesis indicates the number of seconds the client will wait before retrying the operation. With each retry, the number of seconds to wait is increased; therefore, the higher the number the longer it has been stuck. See the log file for more information.

terminate

The client is shutting down.

The cms field has the following states:

unknown

Initial state before connecting to the metadata server.

down

The client is not in membership.

fetal

The client is joining membership.

up

The client is in membership.

quiesce

The client is dropping out of membership.

The xvm field has the following states:

unknown

Initial state before connecting to the metadata server.

down

After membership, but before any XVM information has been gathered.

fetal

Gathering XVM information.

up

XVM volumes have been retrieved.

The fs field has the following states:

unknown

Initial state before connecting to the metadata server.

down

One or more filesystems are not in the desired state.

up

All filesystems are in the desired state.

retry

One or more filesystems cannot be mounted/unmounted, and will retry. See the "Filesystem" section of cxfs_info output to see the affected filesystems.

Verifying the I/O Fencing Configuration

To determine if a node is correctly configured for I/O fencing, log in to a server-capable administration node and use the cxfs-config(1M) command. For example:

admin# /usr/cluster/bin/cxfs-config

The failure hierarchy for a client-only node should be listed as Fence, Shutdown, as in the following example:

Machines:
     node cxfswin2: node 102   cell 1  enabled  Windows client_only
         hostname: cxfswin2.melbourne.sgi.com
         fail policy: Fence, Shutdown
         nic 0: address: 192.168.0.102 priority: 1

See “Defining the Client-Only Nodes” to change the failure hierarchy for the node if required.

The HBA ports should also be listed in the switch configuration:

Switches:
     switch 1: 16 port brocade [email protected] <no ports masked>
         port 5: 210200e08b51fd49 cxfswin2
         port 15: 210100e08b32d914 admin1
     switch 2: 16 port brocade [email protected] <no ports masked>
         port 5: 210300e08b71fd49 cxfswin2
         port 14: 210000e08b12d914 admin1

No warnings or errors should be displayed regarding the failure hierarchy or switch configuration.

If the HBA ports for the client node are not listed, see the following:

Verifying Access to XVM Volumes

To verify that a client node has access to all XVM volumes that are required to mount the configured filesystems, log on to a server-capable administration node and run:

admin# /usr/cluster/bin/cxfs-config -xvm

This will display the list of filesystems and the XVM volume and volume elements used to construct those filesystems. For example:

     fs stripe1: /mnt/stripe1         enabled
         device = /dev/cxvm/stripe1
         force = false
         options = []
         servers = cxfs5 (0), cxfs4 (1)
         clients = cxfs4, cxfs5, cxfs6, cxfsmac4, cxfssun1
         xvm:
             vol/stripe1                       0 online,open
                 subvol/stripe1/data      2292668416 online,open
                     stripe/stripe1           2292668416 online,open
                         slice/d9400_0s0          1146334816 online,open
                         slice/d9400_1s0          1146334816 online,open

             data size: 1.07 TB

You can then run the xvm command to identify the XVM volumes and disk devices. This provides enough information to identify the device's WWN, LUN, and controller. In the following example, the slice/d9400_0s0 from phys/d9400_0 is LUN 0 located on a RAID controller with WWN 200500a0b80cedb3 .

admin# xvm show -e -t vol
vol/stripe1                       0 online,open
     subvol/stripe1/data      2292668416 online,open
         stripe/stripe1           2292668416 online,open (unit size: 1024)
             slice/d9400_0s0          1146334816 online,open (d9400_0:/dev/rdsk/200500a0b80cedb3/lun0vol/c2p1)
             slice/d9400_1s0          1146334816 online,open (d9400_1:/dev/rdsk/200400a0b80cedb3/lun1vol/c3p1)

On all platforms other than Windows, you can then run the xvm command on the client to identify the matching disk devices on the client. For example:

solaris# /usr/cxfs_cluster/bin/xvm show -e -t vol
vol/stripe1                       0 online,open
     subvol/stripe1/data      2292668416 online,open
         stripe/stripe1           2292668416 online,open (unit size: 1024)
             slice/d9400_0s0          1146334816 online,open (d9400_0:[email protected],600000/JNI,[email protected],1/[email protected],0)
             slice/d9400_1s0          1146334816 online,open (d9400_1:[email protected],600000/JNI,[email protected]/[email protected],1)


Note: The xvm command on the Windows does not display WWNs.

If a disk device has not been found for a particular volume element, the following message will be displayed instead of the device name:

no direct attachment on this cell

For example:

solaris# /usr/cxfs_cluster/bin/xvm show -e -t subvol/stripe1 
0 online,open,no physical connection
     subvol/stripe1/data      2292668416 online,open
         stripe/stripe1           2292668416 online,open (unit size: 1024)
             slice/d9400_0s0          1146334816 online,open (d9400_0:no direct attachment on this cell)
             slice/d9400_1s0          1146334816 online,open (d9400_1:no direct attachment on this cell)

Using the device information from the server-capable administration node, it should then be possible to determine if the client can see the same devices using the client HBA tools and the RAID configuration tool.

To see the complete list of volumes and devices mappings, especially when XVM failover V2 is configured, run:

solaris# /usr/cxfs_cluster/bin/xvm show -v phys

For more information about xvm, see the XVM Volume Manager Administrator's Guide.