The IRIS FailSafe system uses the configuration file /var/ha/ha.conf to determine system resources, such as the node names, network interface names and addresses, and shared storage parameters. This chapter explains how to create and customize a configuration file for your IRIS FailSafe system.
The major sections of this chapter are as follows:
Appendix C, “Sample ha.conf Configuration File,” contains an example of a complete configuration file.
The version number of the format of ha.conf is n.m, where n is the value of the version-major configuration parameter and m is the value of the version-minor configuration parameter (see the section “Internal Block” in this chapter). IRIS FailSafe 1.1 uses version 1.2 or 1.1 ha.conf files. Version 1.0 ha.conf files are not supported by IRIS FailSafe 1.2.
This chapter describes version 1.2 ha.conf files. A 1.1 ha.conf file developed for IRIS FailSafe Release 1.1 can be used as is with IRIS FailSafe Release 1.1 on CHALLENGE nodes. A cluster that includes one or two Origin nodes must use version 1.2 ha.conf files. See the section “Overview of Upgrading an IRIS FailSafe Cluster From Release 1.1 to Release 1.2” in Chapter 1 for a list of differences.
Follow these steps to create a new configuration file for an IRIS FailSafe cluster:
Determine your IRIS FailSafe configuration by studying the examples in Chapter 2, “Planning IRIS FailSafe Configuration,” and developing your own diagrams of your configuration and lists of parameters as suggested in that chapter.
Begin creating your configuration file by concatenating configuration file templates from the directory /var/ha/templates in the order below and saving the result in a temporary file in a convenient location:
one copy of ha.conf.system
one copy of ha.conf.interfaces
one copy of ha.conf.volumes if there are shared disks in your configuration
one copy of ha.conf.filesystems if XFS filesystems are used on shared disks in your configuration
one copy of ha.conf.nfs, if you are using the optional IRIS FailSafe NFS product
one copy of ha.conf.web, if you are using the optional IRIS FailSafe Web product
Using the information you prepared with Chapter 2 and the information in the section “Blocks in the Configuration File” in this chapter, edit your configuration file from step 2.
On one of the nodes in the IRIS FailSafe cluster, verify the format and contents of your configuration file by running the ha_cfgverify command:
# /usr/etc/ha_cfgverify yourfile |
The messages output by this command are described in Appendix A, “Messages About Configuration File Errors.”
For warnings reported by ha_cfgverify, check manually that there are no errors in the configuration file.
Resolve errors reported by ha_cfgverify and re-run the command until it reports that the file has passed.
Copy the configuration file to any location on the other node in the cluster and repeat steps 4 through 6.
Copy the final configuration file from the second node back to the original file so that the copies of the configuration file on each node are identical.
If IRIS FailSafe is not running on the nodes in the cluster, install the configuration file by entering these commands on each node:
# cp yourfile /var/ha/ha.conf # chown root.sys /var/ha/ha.conf # chmod 500 /var/ha/ha.conf # /usr/etc/ha_cfgchksum |
The checksums output by ha_cfgchksum on each node must be identical.
If IRIS FailSafe is running on the cluster, follow the instructions in the section “Upgrade Procedure C” in Chapter 7 to install the configuration file on each node.
An IRIS FailSafe configuration file includes a series of blocks that define the configuration of an IRIS FailSafe system and the application classes that are to be failed over on the system. An application class is a type of high-availability service (resource or application), such as logical volumes or NFS filesystems. The blocks contain a hierarchical description of parameters and their values. These parameters provide the IRIS FailSafe software with information about the high-availability services on the IRIS FailSafe cluster.
Table 4-1 lists the blocks in a configuration file and summarizes their contents.
Table 4-1. Major Blocks in ha.conf
The syntax of the configuration file is defined by the following grammar:
<config> ::= EOF, <blocks> ::= <block>, <block> ::= STRING `=' VALUE, <items> ::= <item>, <item> ::= VALUE | | <blocks> EOF | <blocks> <block> | STRING <compound> | STRING STRING <compound> | STRING `=' `(` <items> `)' | <items> <item> | |
<compound> ::= `{` <blocks> `}' |
Every equal sign (=) must be separated from the characters before and after it by whitespace (space characters, tabs, or newline). Comments begin with a pound sign (#) and terminate at the end of the line on which they appear.
The subsections below show examples of each type of block in the configuration file and describe the parameters in each block.
Example 4-1 shows an example system-configuration block.
system-configuration { mail-dest-addr = [email protected] reset-host = irisconsole pwrfail = true monitor-failures = on } |
The parameters in this block are as follows:
A node block describes each of the public interfaces for that node that have to be monitored. It also has information about heartbeat messages and the private network.
Example 4-2 shows an example of the node block for one node. Each configuration file must include a second, similar node block for the other node.
node xfs-ha1 { interface xfs-ha1-ec0 { name = ec0 ip-address = xfs-ha1 netmask = 0xffffff00 broadcast-addr = 190.0.2.255 # MAC-address = 8:0:69:2:65:fd } heartbeat { hb-private-ipname = priv-xfs-ha1 hb-public-ipname = xfs-ha1 hb-probe-time = 5 hb-timeout = 5 hb-lost-count = 3 } reset-tty = /dev/ttyd2 sys-ctlr-type = CHAL controlled-failback = false } |
The node label (xfs-ha1 in this example) must match the return value of the hostname command on the node.
The sections and configuration parameters in a node block are as follows:
interface | An interface section describes a public interface for the node. The interface label is created from the ip-address and name parameters and must be unique in the configuration file. There is one interface section for each public interface in the node that is part of IRIS FailSafe. Not all public interfaces need to be part of IRIS FailSafe. | ||
name | A network interface. Each node has several network interfaces. For example, a CHALLENGE S node has the network interfaces ec0, ec2, and ec3. | ||
ip-address | The IP address for this interface. It can be a name (string) or an address (X.X.X.X). The IP address must be configured in the file /etc/config/netif.options. It is not failed over and is known as a fixed IP address. | ||
netmask | |||
broadcast-addr | |||
MAC-address |
The Physical Address column lists the interface's MAC address (interfaces must be configured up to be displayed; for more information, see the macconfig(1M) reference page). | ||
heartbeat | |||
hb-private-ipname |
| ||
hb-public-ipname |
| ||
hb-probe-time |
| ||
hb-timeout | Specifies how long (in seconds) to wait for a heartbeat response before declaring a failure. The recommended value for hb-timeout is 5. | ||
hb-lost-count | Specifies how many heartbeat probe failures must occur to declare a heartbeat failure. The worst-case length of time that it takes for one node to declare that the other node is down is (hb-probe-time + hb-timeout) × hb-lost-count The recommended value for hb-lost-count is 3. | ||
reset-tty | The device filename of the serial port used by the serial cable connected to the remote power control unit or system controller port. | ||
sys-ctlr-type | The type of system controller on this node. The possible values are CHAL (for CHALLENGE systems), MMSC (for Origin2000 and Onyx 2 rack systems, which have an MMSC system controller), and MSC (for Origin2000 or Onyx2 deskside systems or Origin200 systems, which have an MSC system controller). CHAL is the default if this parameter is not specified. | ||
controlled-failback |
|
IRIS FailSafe software fails over high-availability IP addresses and MAC addresses that are configured in interface-pair blocks. (Fixed IP addresses are not failed over and are specified in node blocks.) There is one interface-pair block for each primary interface.
Example 4-3 shows two interface-pair blocks. They are based on the node block for xfs-ha1 shown in Example 4-2 and a similar node block for xfs-ha2. The configuration is shown in Figure 2-4. In the example interface-pair blocks shown in Example 4-3, the interface-pair block labeled “one” specifies that if there is a failure on node xfs-ha1, the IP alias stocks (configured on the primary interface ec0 on xfs-ha1) moves to interface ec0 in the node xfs-ha2 (the secondary interface). After failover, both of the high-availability IP addresses stocks and bonds are automatically configured on the ec0 interface of xfs-ha2 by IRIS FailSafe using IP aliasing. Similarly, the interface-pair block labeled “two” specifies that the IP alias bonds (configured on the primary interface ec0 on xfs-ha2) moves to interface ec0 in the node xfs-ha1 (the secondary interface) during failover.
interface-pair one { primary-interface = xfs-ha1-ec0 secondary-interface = xfs-ha2-ec0 re-mac = false netmask = 0xffffff00 broadcast-addr = 190.0.2.255 ip-aliases = ( stocks ) } interface-pair two { primary-interface = xfs-ha2-ec0 secondary-interface = xfs-ha1-ec0 re-mac = false netmask = 0xffffff00 broadcast-addr = 190.0.2.255 ip-aliases = ( bonds ) } |
The interface-pair labels (one and two in this example) are arbitrary and are not used elsewhere in the configuration file.
The parameters used in interface-pair blocks are these:
primary-interface |
| |
secondary-interface |
| |
re-mac | If re-MACing is required, set re-mac to true. Otherwise set it to false or leave it undefined. (See the procedure in the section “Planning Network Interface and IP Address Configuration” in Chapter 2 to determine if re-MACing is required.) | |
netmask | ||
broadcast-addr |
| |
ip-aliases | The list of IP addresses to be failed over using IP aliasing. The format of the value is a left parenthesis, IP addresses separated by blanks, and a right parenthesis. |
The interface-agent block describes the monitors and timeouts associated with the interface agent.
Example 4-4 shows an example interface-agent block.
interface-agent { start-monitor-time = 60 interface-probe-interval = 30 interface-probe-timeout = 20 remote-send-probe-interval = 25 remote-send-timeout = 10 } |
The configuration parameters used in the interface-agent block are these:
The application classes are IRIS FailSafe high-availability services. An application class is a category of service; there may be many instances of the service. For example there may be many NFS filesystems; their application class is nfs. The main, interfaces, volumes, and filesystems application classes are provided as part of the IRIS FailSafe product. NFS, Netscape, INFORMIX, Oracle, and Sybase application classes are supported by optional IRIS FailSafe products.
Each application class is provided by at least one node, called the primary node, and is designated as server-node in the configuration file. Example 4-5 shows the application-class blocks for main, interfaces, volumes, and filesystems (without comments).
application-class main { server-node = xfs-ha1 server-node = xfs-ha2 } application-class interfaces { server-node = xfs-ha1 server-node = xfs-ha2 agent = /usr/etc/ha_ifa } application-class volumes { server-node = xfs-ha1 server-node = xfs-ha2 } application-class filesystems { server-node = xfs-ha1 server-node = xfs-ha2 } |
The parameters used in these application-class blocks are as follows:
The IRIS FailSafe software supports failover for XLV logical volumes, allowing the backup node to take over volumes if the primary node fails. These XLV logical volumes must be created on either plexed (mirrored) disks or on CHALLENGE RAID disks. The volume block describes the primary and backup node for one volume and its device name.
Example 4-6 shows an example of a volume block on an IRIX 6.4 system.
volume shared1_vol { server-node = xfs-ha1 backup-node = xfs-ha2 devname = shared1_vol devname-owner = root devname-group = sys devname-mode = 0600 } |
The parameters used in volume blocks are as follows:
The IRIS FailSafe software supports failover for filesystems on shared XLV logical volumes, allowing the backup node to take over filesystems if the primary node fails. The filesystem block describes the mount options and arguments passed to the mount command.
Example 4-7 shows a filesystem block.
filesystem shared1 { mount-point = /shared1 mount-info { fs-type = xfs volume-name = shared1_vol mode = rw,noauto,wsync } } |
The parameters used in filesystem blocks are as follows:
mount-point | The pathname of a filesystem mount point. Both nodes use the same mount point. | |
fs-type | ||
volume-name | The value assigned to volume-name must match a volume block label. | |
mode | The filesystem mount options (see the fstab(4) reference page). See the section “Wsync Filesystem Options” in this chapter for information about the wsync option. |
Action blocks are defined for each application class. The action block specifies the pathnames of the monitoring scripts.
Example 4-8 shows the action blocks for the main, interfaces, volumes, and filesystems application classes.
action main { giveaway = /var/ha/actions/giveaway giveback = /var/ha/actions/giveback takeback = /var/ha/actions/takeback takeover = /var/ha/actions/takeover kill = /usr/etc/ha_kill } action interfaces { local-monitor = /var/ha/actions/ha_ifa_lmon } action volumes { local-monitor = /var/ha/actions/ha_vol_lmon } action filesystems { local-monitor = /var/ha/actions/ha_filesys_lmon } |
The parameters used in the action block for the main application class are as follows:
giveaway, giveback, takeback, takeover |
| |
kill |
This parameter is used in action blocks for other application classes:
Action-timer blocks are defined for each application class that is monitored. They specify various monitoring frequency parameters. Information about determining values for monitoring frequency parameters is provided in the section “Choosing Monitoring Frequency Values” later in this chapter. An action-timer block is required for each application class except the application class main.
Example 4-9 shows the action-timer blocks for the interfaces, volumes, and filesystems application classes. There is no action-timer block for the main application class.
action-timer interfaces { start-monitor-time = 60 lmon-probe-time = 60 lmon-timeout = 60 retry-count = 5 } action-timer volumes { start-monitor-time = 300 lmon-probe-time = 300 lmon-timeout = 60 retry-count = 1 } action-timer filesystems { start-monitor-time = 60 lmon-probe-time = 120 lmon-timeout = 60 retry-count = 1 } |
The parameters used in action-timer blocks vary with the application class. These parameters are used for application classes:
The internal block contains timeout and version parameters. With the exception of long-timeout, these parameter values should not be changed. Example 4-10 shows an example of an internal block.
internal { short-timeout = 5 long-timeout = 60 version-major = 1 version-minor = 2 } |
These are the parameters in the internal block:
Example 4-11 shows the application-class block in a dual-active NFS configuration. In this configuration, each node (xfs-ha1 and xfs-ha2) is the primary node for at least one filesystem. For an active/backup configuration, only one server-node would be included in the application-class block for nfs.
application-class nfs { server-node xfs-ha1 { statmon-dir = /shared1/statmon ip-aliases = ( stocks ) } server-node xfs-ha2 { statmon-dir = /shared2/statmon ip-aliases = ( bonds ) } } |
There is a server-node section for each node that is a primary node for NFS services. Its label matches the label of one of the node blocks. These are the parameters in the server-node sections:
The nfs blocks specify exported NFS filesystems including the options and arguments used by the exportfs command. Example 4-12 shows an example of an nfs block. You must create an nfs block for every filesystem to be exported and failed over.
nfs shared1 { filesystem = shared1 export-point = /shared1 export-info = rw,wsync ip-address = 190.0.2.3 } |
The label for the nfs block must match the label of a filesystem block. These are the parameters used in nfs blocks:
filesystem | A filesystem to be exported. The value of this parameter must match the label of a filesystem block and the label of the block. | |
export-point | ||
export-info | Filesystem export options (see the exports(4) reference page). See the section “Wsync Filesystem Options” in this chapter for information about the wsync option. | |
ip-address | One (any one) of the IP aliases on the primary node for this filesystem, preferably in the form X.X.X.X. It must be a high-availability IP address. A good choice is the IP alias used by clients to mount the filesystem. |
Example 4-13 shows action and action-timer blocks (without comments) for NFS.
action nfs { local-monitor = /var/ha/actions/ha_nfs_lmon } action-timer nfs { start-monitor-time = 60 lmon-probe-time = 60 lmon-timeout = 120 retry-count = 2 } |
The parameters in these blocks are explained in the sections “Action Blocks” and “Action-Timer Blocks” in this chapter.
Example 4-14 shows the webserver application-class block for an active/backup Netscape server configuration.
application-class webserver { server-node = xfs-ha1 } |
Example 4-15 shows the webserver application-class block for a dual-active Netscape server configuration.
application-class webserver { server-node = xfs-ha1 server-node = xfs-ha2 } |
The server-node parameters are the nodes that serve as primary nodes for Netscape servers. Their values must match a node label used in a node block.
The webserver block in a Netscape server configuration file specifies the location and ports for the Netscape servers. Example 4-16 shows this block for an active/backup configuration. In this example, the primary node is configured with two Netscape FastTrack servers: one listens to the default port, 443, and the other one listens to port 453.
webserver webxfs-ha1 { server-node = xfs-ha1 backup-node = xfs-ha2 httpd-script = /etc/init.d/ns_fasttrack httpd-options-file = ns_fasttrack.options httpd-restart = false webserver-num = 2 web-config1 { monitoring-level = 1 search-string = ns-httpd ip-address = 190.0.2.3 port-num = 443 httpd-dir = /shared/httpd-443 } web-config2 { monitoring-level = 1 search-string = ns-httpd ip-address = 190.0.2.5 port-num = 453 httpd-dir = /shared/httpd-453.190.0.2.5 } } |
Example 4-17 shows the webserver blocks for a dual-active configuration. In this example, each node is the primary node for one Netscape Enterprise server.
webserver webxfs-ha1 { server-node = xfs-ha1 backup-node = xfs-ha2 httpd-script = /etc/init.d/ns_enterprise httpd-options-file = ns_enterprise.options httpd-restart = false webserver-num = 1 web-config1 { monitoring-level = 1 search-string = ns-httpd ip-address = 190.0.2.3 port-num = 80 httpd-dir = /shared.stocks/httpd-80 } } webserver webxfs-ha2 { server-node = xfs-ha2 backup-node = xfs-ha1 httpd-script = /etc/init.d/ns_enterprise httpd-options-file = ns_enterprise.options httpd-restart = false webserver-num = 1 web-config1 { monitoring-level = 1 search-string = ns-httpd ip-address = 190.0.2.4 port-num = 80 httpd-dir = /shared.bonds/httpd-80 } } |
The webserver labels (webxfs-ha1 and webxfs-ha2) are arbitrary. These are the parameters used in webserver blocks:
There is one webconfign section (webconfig1, webconfig2, and so on) for each Netscape server on a node.
Example 4-18 shows webserver action and action-timer blocks (without comments).
action webserver { local-monitor = /var/ha/actions/ha_web_lmon } action-timer webserver { start-monitor-time = 60 lmon-probe-time = 20 lmon-timeout = 30 retry-count = 1 } |
The parameters in these blocks are explained in the sections “Action Blocks” and “Action-Timer Blocks” in this chapter.
NFS clients writing to mounted filesystems typically receive confirmation from the node that a write has completed once data has been successfully received at the node, but not necessarily written to disk. This delayed-write (also known as asynchronous write) feature can greatly improve performance because it allows the node's disk buffer cache to be utilized. In a FailSafe configuration, however, the potential for data corruption due to this feature of NFS is greatly increased because of the automatic failover mechanism (power to the failed node is cut).
You can reconfigure NFS exports and XFS local mounts on the node to perform writes synchronously to disk (when received) by adding the parameter wsync to the mode parameter in all filesystem blocks and to the export-info parameter in all nfs blocks. This means that a reply to an NFS write is not returned until the NFS write data is written to the server's disk. However, performance can be greatly affected by adding the wsync option in the filesystem blocks and the wsync option in the nfs blocks. You must balance the risk of NFS data corruption during a node failure against the performance gain from using asynchronous NFS writes. Some applications perform their own error checking of data or perform writes in such a way that data corruption does not occur.
For more information on the performance impact when wsync is used, see the ONC3/NFS Administrator's Guide .
The subsections below discuss the monitoring frequency parameters for the application classes and provide information to help you choose values for these parameters that are appropriate for your IRIS FailSafe cluster.
Figure 4-1 shows the communication paths between the IRIS daemons and scripts. Each of the communications paths that has monitoring frequency parameters associated with it is labeled with those parameters. You can use this diagram to help you understand which communication path and monitoring activity each of the parameters controls.
The start-monitor-time parameter is the length of time in seconds that the application monitor or agent waits after the application instances have been started up by IRIS FailSafe before performing its first probe. This time is provided to allow the resource or application to start up.
The hb-timeout, lmon-timeout, interface-probe-timeout, and remote-send-timeout parameters are the maximum lengths of time that the application monitor or interface agent waits for a response to a probe. If a response is received, the probe is completed and has been successful. If a response isn't received within the hb-timeout, lmon-timeout, interface-probe-timeout, or remote-send-timeout length of time, the probe is also completed, but it has failed.
The hb-probe-time, lmon-probe-time, interface-probe-interval, and remote-send-probe-interval parameters are the length of time in seconds that the application monitor or interface agent waits from the completion (successful completion or timeout) of the previous probe before performing the next probe.
Figure 4-2 has a time line that shows the relationship between the parameters described above. It illustrates these points:
The minimum period between probes is hb-probe-time, lmon-probe-time, or interface-probe-interval (depending upon the process being monitored).
The maximum period between probes is a probe-time or probe-interval parameter plus a timeout parameter.
For applications monitored by scripts (NFS filesystems and Netscape servers), retry-count is the number of times that some of the commands internal to the scripts are retried. The retry-count value does not affect the application failure detection time. The worst-case application failure detection time is
xxx-probe-time + xxx-timeout
For applications monitored by agents (the interface agent or a database agent), retry-count is the number of times that the probes are retried before declaring an application failure. The worst-case application failure detection time is
(xxx-probe-time + xxx-timeout) × retry-count
The lost-count parameter is used for heartbeat messages between the nodes in the cluster. The worst-case node failure detection time using heartbeats is
(hb-probe-time + hb-timeout) × hb-lost-count
In choosing time and retry count values for IRIS FailSafe monitoring, keep these important general considerations in mind:
Timeout values must be long enough to avoid treating slow responses as failures.
The failure detection time affects the time that high-availability services may be unavailable.
IRIS FailSafe monitoring can affect the performance of a node, so it is important to consider both the cost of each type of monitoring and the overall cost of all monitoring.
All the monitoring frequencies and timeouts specified in the IRIS FailSafe configuration file are rounded up to the nearest multiple of the smaller of the values of the short-timeout and hb-probe-time parameters. Small changes to the parameter values may have no effect.
The suggested values for each of the monitoring frequency, timeout, and retry count parameters that are given in the configuration file template files and in the examples in Chapter 4 are good starting points for selecting values. However, each cluster is different, so some tuning for your cluster will be required.
The lmon-timeout value for an application class must be greater than ten times the retry-count for the application class.
The start-monitor-time value for all the applications should be greater than the long-timeout value.
The long-timeout value is the maximum time that is needed for executing the failover scripts (giveaway, takeback, takeover, and giveback) for all of the applications and resources.
To assist in determining timeout values, you can temporarily turn off failovers caused by monitoring failures while you are testing. This enables you to identify failures by looking at the file /var/adm/SYSLOG, but not have failovers during the period you are experimenting with timeout values. Turning off failovers can be done in two ways:
Set the monitor-failures parameter in the system-configuration block of each configuration file to off.
Use the ha_admin command to turn off failovers caused by failures detected by monitoring. The command is:
# ha_admin -F off |
You can turn monitoring back on with this command:
# ha_admin -F on |
In choosing monitoring frequency values, you have two somewhat conflicting objectives:
Choose monitoring frequency values sufficiently short that failures are detected quickly.
Choose monitoring frequency values sufficiently long and forgiving that IRIS FailSafe doesn't mistakenly conclude that a failure has occurred and perform an unnecessary failover.
Follow these guidelines to avoid false failovers and tune the configuration file to prevent future false failovers:
Choose higher retry-count values so that some of the commands internal to the probes are retried.
Consider the impact of the time it takes IRIS FailSafe software to resolve IP addresses given as names in the IRIS FailSafe configuration file into IP addresses in internet notation (X.X.X.X). The file /etc/resolv.conf specifies how names are resolved (see step 11 in section “Configuring Network Interfaces” in Chapter 3). If false failovers are occurring because the name server that resolves names fails to respond within timeout values, you may need to switch the name resolution method or increase the appropriate timeout values.
When choosing monitoring frequencies, keep in mind the load on the public networks. The public network load is high when there is lots of activity in the network. If the timeout values of the FailSafe monitoring scripts and interface agent are short, there can be false failovers due to heavy network loads.