Chapter 3. Best Practices

This chapter provides an overview of the best practices for system administration in a FailSafe cluster. It discusses the following:

Planning and Installing a FailSafe Cluster

This section discusses the following:

How Do You Want to Use FailSafe?

You must first decide how you want to use the FailSafe cluster, what applications you want to run, and which of these should be made highly available (HA). This includes deciding how software and data will be distributed. You can then configure the disks and interfaces to meet the needs of the HA services that you want the cluster to provide.

Questions you must answer during the planning process are as follows:

  • How do you plan to use the nodes? Your answers might include uses such as offering home directories for users, running particular applications, supporting an Oracle database, providing Netscape Web service, and providing file service.

  • Which of these uses will be provided as an HA service? SGI has developed FailSafe software options for some HA applications. To offer other applications as HA services, you must develop a set of application monitoring shell scripts as described in the FailSafe Programmer's Guide for SGI Infinite Storage. If you need assistance, contact SGI Professional Services, which offers custom FailSafe agent development and integration services.

  • Which node will be the primary node for each HA service? The primary node is the node that provides the service (exports the filesystem, is a Netscape server, provides the database, and so on).

  • For each HA service, how will the software and data be distributed on shared and non-shared disks? Each application has requirements and choices for placing its software on disks that are failed over (shared) or not failed over (non-shared).

  • Are the shared disks going to be part of a RAID storage system or are they going to be disks in SCSI or Fibre Channel disk storage that have plexed logical volumes on them? Shared disks must be part of a RAID storage system or in a SCSI or Fibre Channel disk storage with plexed logical volumes on them.

  • How will shared disks be configured?

    • As raw XLV logical volumes?

    • As XLV logical volumes with XFS filesystems on them?

    • As local XVM logical volumes with XFS filesystems on them?

    • As CXFS filesystems, which use XVM logical volumes? For information on using FailSafe and CXFS, see “Coexecution of CXFS and FailSafe” in Chapter 2.

    The choice of volumes or filesystems depends on the application that is going to use the disk space.

  • Which IP addresses will be used by clients of HA services? Multiple interfaces may be required on each node because a node could be connected to more than one network or because there could be more than one interface to a single network.

  • Which resources will be part of a resource group? All resources that are dependent on each other must be in the resource group.

  • What will be the failover domain of the resource group? The failover domain determines the list of nodes in the cluster where the resource group can reside. For example, a volume resource that is part of a resource group can reside only in nodes from which the disks composing the volume can be accessed. For more information about failover domains, see “Failover Domain” in Chapter 1.

  • How many HA IP addresses on each network interface will be available to clients of the HA services? At least one HA IP address must be available for each interface on each node that is used by clients of HA services.

  • Which HA IP addresses on nodes in the failover domain are going to be available to clients of the HA services?

  • For each HA IP address that is available on a node in the failover domain to clients of HA services, which interface on the other nodes will be assigned that IP address after a failover? Every HA IP address used by an HA service must be mapped to at least one interface in each node that can take over the resource group service. The HA IP addresses are failed over from the interface in the primary node of the resource group to the interface in the replacement node.

Hardware Requirements

FailSafe runs on a specific set of SGI servers and supported disk storage devices. A cluster can contain up to 8 nodes running FailSafe.

You should provide multiple sources of the following:

  • Power sources.

  • RAID disk devices and mirrored disks. FailSafe supports Fibre Channel RAID and mirrored disks in direct attach and SAN configurations.

  • Paths to storage devices.

  • Networks

  • Fibre Channel switches or 100-MB hubs. If you have more than two nodes, SGI recommends that you use a switch rather than a hub to connect the nodes.


Note: No SCSI storage nor Fibre Channel JBOD is supported in a SAN configuration and therefore it cannot be used in a coexecution cluster with CXFS.

At least two Ethernet or FDDI interfaces on each node are required for the control network heartbeat connection, by which each node monitors the state of other nodes. The FailSafe software also uses this connection to pass control messages between nodes. These interfaces have distinct IP addresses.

FailSafe requires at least two networks of at least 100baseT. All nodes should be on the same local network segment.


Note: When mixing ef and eg interfaces, turn off the HIGHBW flag for the eg interfaces and be sure to run IRIX 6.5.18 or later plus any applicable patches in order to avoid various problems associated with ef interfaces and delayed acknowledgement on eg interfaces.

For tg interfaces that are part of the private network, it is recommended that coal_mode be set to FIXED, coal_frames_rx be set to 1 or 2, and coal_usecs_rx be set in the range 0-30.

For performance and security reasons, SGI recommends that the networks be private. Using a private network limits the traffic on the public network and therefore will help avoid unnecessary resets or disconnects. (If you are running CXFS on the same cluster, then the network must be private, as required by CXFS.) You many want to choose a numbering convention for private networks such as 192.26.50.x for primary network and 192.26.51.x for backup network, where x is the CXFS node ID in the cluster.

The serial hardware reset lines should use Cat5 wire with appropriate connectors and point-to-point connections between nodes. Be aware of the distance limitations for serial cables. If you use the hub method, you must have software loaded to control it and support it. You should have hardware flow control pins (RTS/CTS) connected in the serial cable.

For each disk in a FailSafe cluster, you must choose whether to make it a shared disk (allowing it to be failed over) or a non-shared disk. The system disk must be a non-shared disk. FailSafe software must be on a non-shared disk and all system directories (such as /tmp, /var, /usr, /bin, and /dev) should be on a non-shared disk.

Software Installation

SGI recommends consulting SGI managed services before installing a FailSafe system. For more information, see:

http://www.sgi.com/services/managed_services/

FailSafe is released approximately every six months, with patch releases as necessary. A given FailSafe release supports two consecutive IRIX releases, as defined in the FailSafe release notes. For a complete compatibility matrix, contact SGI customer support.

You must install the cluster infrastructure and FailSafe software components. You may want to install software for ESP, Performance Co-Pilot, accounting, and expect (for the TMF plug-in). You may wish to use sendmail with an alias to be used when reporting problems to the system administrator.

SGI recommends that you make configuration changes when the same version of IRIX and the same version of FailSafe is running on all nodes in the cluster.

When a FailSafe system is running, you may need to perform various administration procedures without shutting down the entire cluster, such as the following:

  • Add a node to an active cluster

  • Delete a node from an active cluster

  • Change control networks in a cluster

  • Upgrade operating system software in an active cluster

  • Upgrade FailSafe software in an active cluster

  • Add new resource groups or resources in an active cluster

  • Add a new hardware device in an active cluster

FailSafe Plugins

Separate releases are available for the optional FailSafe application plug-ins, such as FailSafe for Samba. You should have your FailSafe cluster up and running before installing and configuring an optional plug-in.

The basic process is as follows:

  1. Install, configure, and test the base FailSafe software as described in Chapter 4, “FailSafe Installation and System Preparation ”.

  2. Install any required application software and the plug-in software.

  3. Perform any system file configuration required by the plug-in.

  4. If needed, install the plug-in resource types.

  5. Add the individual instances of the plug-in resources to the cluster database.

  6. Create the resource group that will be failed over.

  7. Test the failover.

Upgrades

When you upgrade your OS software in an active cluster, you perform the upgrade on one node at a time. If the OS software upgrade does not require reboot or does not impact the FailSafe software, there is no need to use the OS upgrade procedure. If you do not know whether the upgrade will impact FailSafe software or if the OS upgrade requires a machine reboot, you should follow the upgrade procedure described in Chapter 4, “FailSafe Installation and System Preparation ”.

In general, you should do the following:

  1. Make sure you have a current copy of the cluster database, which you can obtain by using the build_cmgr_script command or the cdbBackup and cdbRestore commands.


    Note: The build_cmgr_script does not recreate node-specific information for resources and resource types or local logging information because the cluster database does not replicate node-specific information. Therefore, if you reinitialize the cluster database, you will lose node specific information. The build_cmgr_script script does not contain local configuration information, so it cannot be used as a complete backup/restore tool.


  2. Upgrade just one node in the cluster.

  3. Verify that the FailSafe configuration works on the upgraded cluster.

  4. Upgrade the remaining nodes.

Customer Education

At least one administrator from the customer site should take the FailSafe customer training provided by SGI. For information about training, see the SGI customer education website:

http://www.sgi.com/support/custeducation/

Knowing the Tools

This section provides an overview of the tools required to troubleshoot FailSafe:


Caution: Some of the commands listed are beyond the scope of this book and are provided here for quick reference only. See the other guides and man pages referenced for complete information before using these commands.


Physical Storage Tools

To display the hardware inventory, use the hinv command. If you get unexpected output, ensure that there is no current I/O and then probe for devices, perform a SCSI bus reset, and configure the I/O devices by using the following commands

/usr/sbin/scsiha -pr bus_number
/sbin/ioconfig -f /hw

Cluster Configuration Tools

Understand the following cluster configuration tools, use one of the following:

  • To configure FailSafe nodes and cluster, use the GUI ( fsmgr) or the cmgr command line with prompting:

    # cmgr -p

Cluster Control Tools

Understand the following cluster control tools:

  • To start and stop the cluster services daemons:

    # /etc/init.d/cluster start
    # /etc/init.d/cluster stop

    These commands are useful if you know that filesystems are available but are not indicated as such by the cluster status, or if cluster quorum is lost.

  • To start and stop HA services, use the GUI or the following cmgr commands:

    cmgr> start ha_services on node hostname for cluster clustername
    cmgr> stop ha_services on node hostname for cluster clustername

Networking Tools

Understand the following networking tools:

  • To send packets to network hosts, use the ping command.

  • To show network status, use the netstat command.

Cluster/Node Status Tools

Understand the following cluster/node status tools:

  • To provide configuration information and the status of the cluster, nodes, resources, and resource groups, use the haStatus command.

  • To show which cluster daemons are running, use the ps command:

    # ps -ef | grep cluster

  • To see cluster status, use the GUI (fsmgr) or the cluster_status command:

    # /usr/cluster/cmgr-scripts/cluster_status

Performance Monitoring Tools

Understand the following performance monitoring tools:

  • To monitor system activity, use the sar command.

  • To monitor filesystem buffer cache activity, use the bufview command.


    Note: Do not use bufview interactively on a busy node; run it in batch mode (-b). For more information, see the bufview(1) man page.


  • To monitor operating system activity data, use the osview command.

  • To monitor system performance, use Performance Co-Pilot. See the Performance Co-Pilot for IRIX Advanced User's and Administrator's Guide, the Performance Co-Pilot Programmer's Guide, and the pmie and pmieconf man pages.

Log Files

Understand the following log files:

  • System log file (look for Membership delivered ):

  • /var/adm/SYSLOG

  • /var/cluster/ha/log/cad_log

  • /var/cluster/ha/log/clconfd_hostname

  • /var/cluster/ha/log/cli_hostname

  • /var/cluster/ha/log/cmond_log

  • /var/cluster/ha/log/crsd_hostname

  • /var/cluster/ha/log/diags_hostname

  • /var/cluster/ha/log/fs2d_log

  • System administration log:/var/sysadm/salog

FailSafe Diagnostic Commands

Table 3-1 shows the tests you can perform with FailSafe diagnostic commands.

Table 3-1. FailSafe Diagnostic Test Summary

Diagnostic Test

Description

Resource

Checks that:

  • Resource type parameters are set

  • Parameters are syntactically correct

  • Parameters exist

Resource group

Tests all resources defined in the resource group

Failover policy

Checks that:

  • Failover policy exists

  • Failover domain contains a valid list of hosts

Network connectivity

Checks that:

  • The control interfaces are on the same network

  • The nodes can communicate with each other

Serial connection

Checks that the nodes can reset each other (do not execute this command while FailSafe is running)

All transactions are logged to the diagnostics file diags_ Nodename in the log directory.

You should test resource groups before starting FailSafe HA services or starting a resource group. These tests are designed to check for resource inconsistencies that could prevent the resource group from starting successfully.

Configuration

This section discusses the following:

System File Configuration

You must configure the following system files appropriately in order to use FailSafe:

  • /etc/hosts

  • /etc/nsswitch.conf

  • /etc/services

  • /etc/config/cad.options

  • /etc/config/fs2d.options

  • /etc/config/cmond.options

In addition, you must ensure that the following have the correct hostname information in /etc/sys_id.

The following hostname resolution rules and recommendations apply to FailSafe clusters:


Caution: It is critical that you understand these rules before attempting to configure a FailSafe cluster.


  • The hostname must be configured on a network interface connected to the public network and should be resolved using /etc/hosts.

  • Hostnames should not contain an underscore (_) or include any whitespace characters.

  • The /etc/hosts file has the following format, where hostname can be the simple hostname or the fully qualified domain name:

    IP_address hostname 

    For example, suppose your /etc/hosts file contains the following:

    # The public interface:
    128.2.3.4  color-green.sgi.com color-green green
    
    # The private interface:
    192.0.1.1  color-green-private.sgi.com  color-green-private green-private

    The /etc/sys_id file could contain either the hostname color-green or the fully qualified domain name color-green.sgi.com.

    In this case, you would enter the hostname color-green or the fully qualified domain name color-green.sgi.com for the Server field in the login screen and for the Hostname field in the Define a new node window.

  • If you use the name service, you must configure your system so that local files are accessed before either the network information service (NIS) or the domain name service (DNS). That is, the hosts line in /etc/nsswitch.conf must list files first. For example:

    hosts:      files nis dns 

    (The order of nis and dns is not significant to FailSafe; files must be first.)

    The /etc/config/netif.options file must have one of the interfaces be equal to the value of /etc/sys_id ($HOSTNAME).

    For more information see the nsswitch.conf and the nsd man pages.

  • If you change the /etc/nsswitch.conf or /etc/hosts files, you must restart nsd by using the nsadmin restart command, which also flushes its cache.

    The reason you must restart nsd after making a change to these files is that the nsd name service daemon actually takes the contents of /etc/hosts and places the contents in its memory cache in a format that is faster to search. Thus, you must restart nsd in order for it to see that change and place the new /etc/hosts information into RAM cache. If /etc/nsswitch.conf is changed, nsd must re-read this file so that it knows what type of files (for example, hosts or passwd) to manage, what services it should call to get information, and in what order those services should be called.

    The IP addresses on a running node in the cluster and the IP address of the first node in the cluster cannot be changed while cluster services are active.

  • You should be consistent when using fully qualified domain names in the /etc/hosts file. If you use fully qualified domain names in /etc/sys_id on a particular node, then all of the nodes in the cluster should use the fully qualified name of that node when defining the IP/hostname information for that host in their /etc/hosts file.

    The decision to use fully qualified domain names is usually a matter of how the clients (such as NFS) are going to resolve names for their client server programs, how their default resolution is done, and so on.

  • If you change hostname/IP address mapping for a node in the cluster, you must recreate the node in the configuration database. You must remove the node from the cluster and the database, restart cluster processes on that node, and then define the node and add it to the cluster.

  • The /etc/sys_id file contains the hostname of the machine and should not be associated with an HA IP address.

Cluster Database Configuration

When you configure a FailSafe cluster, you will do the following:

  1. Verify that the cluster chkconfig flag is on.

  2. Start the cluster daemons.

  3. Determine the hostname of the nodes to be defined as members of the cluster.

  4. Configure the database, using the FailSafe Manager GUI or the cmgr command:

    • Define the nodes in the pool. Follow the hostname resolution rules in “System File Configuration”. You should use IP addresses instead of IP names.

    • Define the cluster.

    • Start HA services.

    • Define resources and resource groups.

    • Define failover policies.

    • Define a tiebreaker node.

  5. Test the system:

    • Test individual components before starting FailSafe.

    • Test connectivity among the nodes.

    • Test normal operation of the system when FailSafe is running.

    • Simulate failures when FailSafe is running.

  6. Use the build_cmgr_script command to generate a cmgr script based on the contents of the cluster database. As needed, you can then use this generated script to recreate the cluster database after performing a cdbreinit when HA services are off. This may be useful for backups or troubleshooting. You should run build_cmgr_script again after making significant cluster database changes. Also see the cdbBackup command.

For details, see Chapter 6, “Configuration”.

Using the Administration Tools

SGI recommends that new users use the FailSafe Manager GUI and the guided tasks. Experienced users can use the cmgr command line for repetitive tasks. The GUI provides guided configuration tasks sets and online help; click on any blue text to get more information.

You can launch the GUI with the fsmgr command, from the IRIX toolchest, or by entering the following URL on a PC or other workstation, where server is the name of an administration node in the pool:

http://server/FailSafeManager

(The URL method requires the Java 1.4.1 plug-in.) For more information, see “Starting the GUI and Connecting to a Node” in Chapter 5.

You should perform all FailSafe administration from one node in the pool so that the latest copy of the database will be available even if there are network partitions. You should wait for a change to appear in the details area of the GUI before making another change; if you make changes too quickly, errors can occur.

The cmgr command line interface is also provided.

You should make changes from only one instance of the GUI or cmgr running at any given time; simultaneous changes made by a second GUI instance (that is, a second invocation of fsmgr ) or simultaneous changes made by the GUI and by cmgr may overwrite each other.

The FailSafe administration node to which you connect the GUI affects your view of the cluster. You should wait for a change to appear in the details area before making another change; the change is not guaranteed to be propagated across the cluster until it appears in the view area. (To see the location of the view and details areas, see Figure 5-1.) The entire cluster status information is sent to every FailSafe administration node each time a change is made to the cluster database.

If you are running a coexecution cluster, there is a single cluster database that applies to both FailSafe and CXFS. The cmgr command can be used to modify the cluster database for either product, but each product has its own GUI. If a given IRIX administration node applies to both FailSafe and CXFS, you should ensure that any modifications you make to its definition in the database are appropriate for both products.

Determining the Number of Pools

The number of clusters and pools you define depends upon your failover policies. Each pool contains a single cluster and is unrelated to other pools; only one FailSafe cluster can be present in a pool.

Suppose you have four nodes (A, B, C, D), with two failover pairs (A/B and C/D):

  • If the applications running on A or B are never intended to run on C or D (and vice versa), then it is better to set up two independent clusters in separate pools. In this case, you could log into A and define nodes A and B (forming the first pool), and define cluster AB. You would then log into C and define nodes C and D (forming the second pool), and define cluster CD. There is no interrelationship between cluster AB and cluster CD; they are in separate pools. The pools should have separate networks.

  • If the applications running on A/B might at some point run on nodes C or D, then all four nodes should be configured as one cluster within a single pool.

Node Names

The first node you define must be the node that you have logged into.

Once a node has been defined, you cannot change its name or node ID in the cluster database. To use a different name or ID, you must delete it and redefine it. This also applies to cluster names and IDs.

Issues with a Two-Node Cluster

If you have a two-node cluster, you should create an emergency failover policy for each node in preparation for a time when it may need to run by itself. This situation can occur if the other must stay down for maintenance or if it fails and cannot be brought up. Without these emergency failover policies and the appropriate set of procedures, the surviving node will never form a cluster by itself. For procedures, see “Two-Node Clusters: Single-Node Use” in Chapter 8

Determining Which Nodes Perform Resets

The Etherlite multiplexer hardware is capable of performing a reset from any node, but the FailSafe software requires that you name a system controller owner node when defining a node. Only that owner node defined in the cluster database may perform the reset. Serial cables must physically connect the node being defined and the owner node through the system controller port.

Network Interface and Hostnames vs IP Addresses

Although it is possible to enter a hostname in the network interface field when defining a node, this requires DNS on the nodes; it is recommended that you enter an IP address in dot notation instead.

Appropriate Timeout Determination

An inappropriate node timeout will result in false failovers. An appropriate value will take time to determine; this can be the most difficult part of the FailSafe configuration process.

The timeout must be at least 5 seconds and must also be at least 10 times the heartbeat interval.

Use Performance Co-Pilot to determine the appropriate timeout levels for your resource groups.

Tiebreaker Nodes

In a cluster where the nodes are of different sizes and capabilities, the largest node in the cluster with the most important application or the maximum number of resource groups should be configured as the tiebreaker node. (By default, the FailSafe tiebreaker is the node with the lowest node ID.)

To set the tiebreaker node, see “Set FailSafe HA Parameters” in Chapter 6.

Log Files

When you first install FailSafe, you should set logging levels high to obtain enough information for troubleshooting. You must edit the following file to use the appropriate levels:

/var/cluster/ha/common_scripts/scriptlib

For example, change the log level line from:

HA_CURRENT_LOGLEVEL=2

to:

HA_CURRENT_LOGLEVEL=11

The following levels are recommended for each daemon during the testing and configuration process:

cli        2
clconfd    5
crsd      13
diags      2
ha_agent   5
ha_cmsd   11
ha_gcd     5
ha_ifd    15
ha_fsd    12
ha_script  5
ha_srmd   13

After the system is running satisfactorily, you can reduce the log levels if the log files are filling too quickly.

For more information, see “Set Log Configuration” in Chapter 6.

Offline Detach Issues

Performing an offline_detach operation on a resource group leaves the resources in the resource group running on the node. FailSafe does not track these resources any longer.

Because FailSafe is no longer monitoring the group after the offline_detach (or offline_detach_force) is executed, it must recover on the same node where it was running at the time the offline_detach was performed. You must not allow resources to run on multiple nodes in the cluster.

This also means that no other nodes should be allowed to rejoin the FailSafe membership, especially if Auto_Recovery is set in the resource group's failover policy. This restriction is required because the failover policy scripts are run whenever there is a change in membership; rerunning the scripts could cause your previously offline detached resource group to start on a node other than the node where the offline_detach was performed.

FailSafe policy scripts are run only on nodes where FailSafe is running (that is, nodes where HA services have been started). For example, suppose you have a four-node FailSafe cluster (with nodes A, B, C and D), where nodes A, B, and C are in UP state and node D is DOWN state. If resource group RG is made offline using the offline_detach or offline_detach_force command on node B and HA services are shutdown on node B, node D should not rejoin the cluster before resources in RG are stopped manually on node B. If node D rejoins the cluster, the resource group RG will be made online on nodes A, C or D.

Testing the Configuration

Test the system in three phases:

  • Test individual components prior to starting FailSafe software

  • Test normal operation of the system

  • Simulate failures to test the operation of the system after a failure occurs

During the first few weeks of operation, examine the failovers of each resource group to determine if they are due to inappropriately short timeout values; adjust the timeout values as needed.


Note: Performing a backup of the entire system may add stress to the system. You should consider this when determining resource group timeouts in order to avoid unnecessary failovers.

See Chapter 9, “Testing the Configuration”.

Administration and Operation

This section covers best practices for the following:

Appropriate Dependencies for filesystem Resources

SGI recommends that you add an XVM or volume resource dependency for a given instance of a filesystem resource. This will prevent errors when resources are added to or deleted from resource groups.

For more information, see “Dependency” in Chapter 1 and “Add/Remove Dependencies for a Resource Definition” in Chapter 6.

Enabling System Accounting

Process accounting data is useful when diagnosing FailSafe problems. Therefore, you should enable either extended accounting or Comprehensive System Accounting on all production servers, even if you have no need to bill users for their time.


Note: Standard SVR4 accounting is not useful for diagnostic purposes because it does not record process ID (PID) information in the process accounting record.

For example, the monitor action script usually consists of a sequence of operating system commands that probe the status of a resource in different ways. The underlying problem causing a monitor action script timeout may be completely different depending on which of these commands caused the timeout. Typically, one specific command will consume the majority of time spent in the script. Backtracking this by examining process accounting data can provide valuable insight that will improve your ability to diagnose timeout problems.

To perform this type of accounting data analysis, you must enable process level accounting data collection on your FailSafe systems before the problem occurs.

For instructions, see IRIX Admin: Resource Administration.

Origin 300, Origin 3200C, Onyx 300, and Onyx 3200C Console Support

On Origin 300, Origin 3200C, Onyx 300, and Onyx 3200C systems, there is only one serial/USB port that provides both L1 system controller and console support for the machine. In a FailSafe configuration, this port (the DB9 connector) is used for system reset. It is connected to a serial port in another node or to the Ethernet multiplexer.

To get access to console input and output, you must redirect the console to another serial port in the machine.


Caution: Redirecting the console works only when IRIX is running. To access the console when IRIX is not running (miniroot), you must physically reconnect the machine: unplug the serial hardware reset cable from the console/L1 port and then connect the console cable.

For instructions, see “Redirecting the Console for Origin 300, Origin 3200C, Onyx 300, and Onyx 3200C” in Chapter 8.

Creating an Emergency Failover Policy for a Two-Node Cluster

If you have a two-node cluster, you should create an emergency failover policy for each node in preparation for a time when it may need to run by itself. This situation can occur if the other must stay down for maintenance or if it fails and cannot be brought up.


Caution: Without these emergency failover policies and the appropriate set of procedures, the resources cannot come online because half or more of the failover domain is down.

For instructions about using a single node and resuming two-node use, see “Two-Node Clusters: Single-Node Use” in Chapter 8.

Interrupting FailSafe Commands

After a FailSafe command is started, it may partially complete even if you interrupt the command by typing Ctrl-c. If you halt the execution of a command this way, you may leave the cluster in an indeterminate state and you may need to use the various status commands to determine the actual state of the cluster and its components.

Monitoring System Status

While the FailSafe system is running, you can monitor the status of the FailSafe components to determine the state of the component. FailSafe allows you to view the system status in the following ways:

  • Keep watch on the state of a cluster using the cluster_status command or the GUI.

  • Query the status of an individual resource group, node, or cluster using either the GUI or the cmgr command. These tools also display the heartbeat network that is currently being used.

  • Use the haStatus script provided with the cmgr command to see the status of all clusters, nodes, resources, and resource groups in the configuration.

  • Use the Embedded Support Partner (ESP), which consists of a set of daemons that perform various monitoring activities. You can choose to configure ESP so that it will log FailSafe events (the FailSafe ESP event profile is not configured in ESP by default).

For details, see “System Status” in Chapter 8.

System Maintenance

To perform system maintenance, do the following.


Note: You must wait for each step to complete before moving to the next step.


  1. Manually move the resource group to the backup system.

  2. Shut down HA services on the target system.

  3. Shut down cluster services on the target system.

  4. Perform the maintenance activity.

  5. Restart cluster services on the target system.

  6. Restart HA services on the target system.

  7. Manually move the resource group back to the target system.

Understanding What Happens After a System Crash or Hang

Following is what happens after the system crashes or hangs:

  1. The crashed system begins a dump and reboot.


    Note: A system with a lot of memory may require a very long heartbeat timeout in order to be able to get a useful dump.


  2. FailSafe detects that the system is not responsive because there is no heartbeat within the node-timeout interval (1 minute by default).

  3. FailSafe issues a system reset across the serial line.

  4. FailSafe forces the failover of the resource groups on the crashed system to the backup system(s).

  5. The formerly crashed system reboots.

  6. The resource groups move back to the original system only if auto_failback is the failover attribute.

Cluster Database Backup and Restore

You should perform a cluster database backup whenever you want to save the database and be able to restore it to the current state at a later point.

You can use the following methods to restore the database:

  • If the database is accidentally deleted from a node, use the fs2d daemon to replicate the database from another node in the pool.

  • If you want to be able to recreate the current configuration, use the build_cmgr_script script. You can then recreate this configuration by running the script generated.

  • If you want to retain a copy of the database and all node-specific information such as log configuration, resource definitions, and resource type definitions, use the cdbBackup and cdbRestore commands. If cdbBackup is not available, you should make the modifications before bringing resource groups online.

For details, see “Cluster Database Backup and Restore” in Chapter 8, and “Filesystem Dump and Restore” in Chapter 8.

Log File Management

You should not change the names of the log files. If you change the names of the log files, errors can occur.

If you are having problems with disk space, you may want to choose a less verbose log level.

You should rotate the log files at least weekly so that your disk will not become full.

You can run the /var/cluster/cmgr-scripts/rotatelogs script to copy all files to a new location. This script saves log files with the day and the month name as a suffix. If you run the script twice in one day, it will append the current log file to the previous saved copy.

By default, the rotatelogs script will be run by crontab once a week, which is sufficient if you use the default log levels. If you plan to run with a high debug level for several weeks, you should reset the crontab entry so that the rotatelogs script is run more often.

On heavily loaded machines, or for very large log files, you may want to move resource groups and stop HA services before running rotatelogs.

For more information, see “Rotating Log Files” in Chapter 8, and “Set Log Configuration” in Chapter 6.

Networking

Fix Networking Problems First

If there are any network issues on the private network, fix them before trying to use FailSafe.

Improve Availability by using UDP

To improve availability during a failover, clients should connect to the HA server using UDP not TCP. If TCP is used, clients will have to reconnect to the server.

NFS

The FailSafe for NFS plug-in provides failover protection for filesystems and directories that are exported using NFS.

In a FailSafe cluster, one or more nodes can export NFS resources. A resource group can contain multiple NFS resources and a single node in the cluster may have multiple resource groups that contain NFS resources. If a node that exports NFS resources fails, another node provides backup service.

For more information about NFS, see IRIX FailSafe NFS Administrator's Guide.

Security

For a secure connection when logging in to the FailSafe GUI, choose Remote Shell and type a secure connection command using a utility such as ssh. Otherwise, the GUI will not encrypt communication and transferred passwords will be visible to users of the network. For more information, see “Logging In” in Chapter 5.

Tuning

Performance Co-Pilot FailSafe metrics are useful for tuning FailSafe, especially in times of ongoing degraded performance. For more information, see Chapter 12, “Performance Co-Pilot for FailSafe”.

Large Filesystems

Running with too many filesystem allocation groups can lead to buffer congestion. This can result in processes appearing to hang, and can result in inappropriate failovers.

If your site has large filesystems, you should run the mkfs command on each filesystem after installing IRIX 6.5.13 or later in order to reduce the number of allocation groups per filesystem. As of IRIX 6.5.13, the default allocation group size for filesystems greater than 65 Gbytes is 4 Gbytes.

CXFS and DMF, NFS, and Samba Resource Types

The DMF resource type requires that the CXFS resource on which it depends has the relocate-mds flag set to true.

Although NFS and Samba perform better when run from the CXFS active metadata server, these resource types do not strictly require that the CXFS resource on which they may depend to have the CXFS plug-in relocate-mds flag set to true. (The default value for the flag is false.)


Note: If you set relocate-mds to true, the server list specified for the CXFS filesystem must match the FailSafe failover domain for the resource group that contains the CXFS resource.


Avoiding Problems

This section covers the following:

Proper Start Up

Ensure that you follow the installation and system preparation instructions in Chapter 4, “FailSafe Installation and System Preparation ”, before configuring the cluster.

Cluster Database Membership Quorum Stability

The cluster database membership quorum must remain stable during the configuration process. If possible, use multiple windows to display the fs2d_log file for each node while performing configuration tasks. Enter the following:

# tail -f /var/cluster/ha/log/fs2d_log

Check the member count when it prints new quorums. Under normal circumstances, it should print a few messages when adding or deleting nodes, but it should stop within a few seconds after a new quorum is adopted.

If not enough nodes respond, there will not be a quorum. In this case, the database will not be propagated.

If you detect cluster database membership quorum problems, fix them before making other changes to the database. Try restarting the cluster infrastructure daemons on the node that does not have the correct cluster database membership quorum, or on all nodes at the same time. Enter the following:

# /etc/init.d/cluster stop
# /etc/init.d/cluster start

Please provide the fs2d log files when reporting a cluster database membership quorum problem.

Consistency in Configuration

Be consistent in configuration files for nodes across the pool, and when configuring networks. Use the same names in the same order.

GUI Use

The GUI provides a convenient display of a cluster and its components through the view area. You should use it to see your progress and to avoid adding or removing nodes too quickly. After defining a node, you should wait for it to appear in the view area before adding another node. After defining a cluster, you should wait for it to appear before you add nodes to it. If you make changes too quickly, errors can occur.

When running the GUI on IRIX, do not move to another IRIX desktop while GUI action is taking place; this can cause the GUI to crash.

Log File Names and Sizes

Periodically, you should rotate log files to avoid filling your disk space.

Removing Unused Nodes

If a node is going to be down for a while, remove it from the cluster and the pool to avoid cluster database membership problems.