This chapter provides information about configuring the FailSafe TMF plug-in:
The procedures described in this chapter assume that a cluster database that does not include TMF has already been created, installed, and tested as described in the FailSafe Administrator's Guide for SGI InfiniteStorage.
To run FailSafe TMF, the TMF software must be enabled. You should ensure that the output from chkconfig shows the following flag set to on:
# chkconfig | grep tmf ... tmf on |
If it is not, set it to on. For example:
# chkconfig tmf on |
To create a TMF resource type, this subsection assumes that you are already familiar with the concepts of resource types. Table 2-1 shows the resource attributes of a TMF resource type. “Configuring a TMF Resource” describes how to use these parameters when configuring a TMF resource.
Table 2-1. TMF Configuration Parameters and Attributes
The TMF resource type is not created at cluster creation time. You must create the resource type before a TMF resource is created. The TMF resource type must be installed if you want to add a TMF resource to a cluster that was created before the FailSafe TMF software was installed.
You can use one of the following methods to create the TMF resource type:
Run cmgr and manually create the resource type. For more information on cmgr, see the FailSafe Administrator's Guide for SGI InfiniteStorage.
Run cmgr and install the resource type, as follows:
cmgr> install resource_type TMF in cluster eagan cmgr> show resource_types installed TMF NFS template Netscape_web statd_unlimited Oracle_DB MAC_address IP_address INFORMIX_DB filesystem volume |
Use the template scripts supplied with FailSafe located in /var/cluster/cmgr-template/ cmgr-create-resource_type.
Execute /var/cluster/ha/resource_type/TMF/create_resource_type and include the path of the cluster database argument and the cluster name.
Run the FailSafe Manager GUI and use the Load Resource Type task to load the resource type. For more information on the FailSafe GUI, see the FailSafe Administrator's Guide for SGI InfiniteStorage.
The FailSafe TMF plug-in performs various functions for a TMF resource, as summarized in “FailSafe TMF Plug-In” in Chapter 1. This section describes how to configure a resource to perform each of these functions.
Table 2-1 summarizes the FailSafe TMF plug-in configuration parameters.
The FailSafe TMF plug-in lets you specify device groups to monitor. You specify a device group through the resource attribute device-group ; it applies to the particular resource that is being defined. A device group refers to the tape devices that belong to a device group as defined in the TMF configuration file. This attribute is required for each resource that you create.
When you create a TMF resource, you must specify the minimum number of devices of a particular device group that must be configured and available for use. This value is specified as the resource attribute devices-minimum and is required for each resource. The default value for devices-minimum is 0, which means that no action is taken by the TMF plug-in even if no tapes are available (many sites would not want FailSafe to take action if the tapes are not available because failover or local restart will not help in that situation).
When you create a TMF resource, you must specify a list of email addresses to notify when the monitoring scripts detect that devices in the device group have become unavailable. Specify this list through the resource attributeemail-addresses as a comma- or white-space-separated list of name.
A TMF resource includes the resource attribute devices-loaned. This attribute is currently unused by the FailSafe TMF plug-in and should be left at its default assigned value.
There are other optional configuration specifications associated with this resource. These specifications provide required information to the FailSafe TMF plug-in that let it communicate with the tape library and they also tell the plug-in which drives within the library on which it will force dismounts.
The FailSafe TMF plug-in can force a dismount of tapes from drives within the library. There may be various reasons why you might want to do this when a failover occurs. In the case of the data migration facility (DMF), you would want to ensure that any DMF tapes that were in use on a previous host are available to DMF on the new node after a failover. If these tapes were in drives assigned to the previous host, they must be ejected and returned to the library so that they are again accessible to DMF on the new host. You may want the FailSafe TMF plug-in to dismount only tape devices associated with a particular resource or you may not want the plug-in to dismount any tapes at all.
If you are using the tpsc tape driver, then in order for the plug-in to be able to force a dismount of tapes, the capabilities list specified for the device in the /var/sysgen/master.d/scsi file must not include the MTCAN_PREV capability.
The following example shows entries from this file for the STK 9840 and STK 9940 drives. The description for the 9840 drive does not include the MTCAN_PREV capability, but the description for the 9940 drive does include it.
/* STK 9840 drive */ { STK9840, TPSTK9840, 3, 4, "STK", "9840", 0, 0, {0, 0, 0, 0}, MTCAN_BSF | MTCAN_BSR | MTCANT_RET | MTCAN_CHKRDY | MTCAN_SPEOD | MTCAN_SEEK | MTCAN_APPEND | MTCAN_SILI | MTCAN_VAR | MTCAN_SETSZ | MTCAN_CHTYPEANY | MTCAN_COMPRESS, 20, 8*60, 10*60, 3*60, 3*60, 16384, 256*1024, tpsc_default_dens_count, tpsc_default_hwg_dens_names, tpsc_default_alias_dens_names, {0}, 0, 0, 0, 0, (u_char *)0 }, /* STK 9940 drive */ { STK9840, TPSTK9840, 3, 4, "STK", "T9940A", 0, 0, {0, 0, 0, 0}, MTCAN_BSF | MTCAN_BSR | MTCANT_RET | MTCAN_CHKRDY | MTCAN_PREV | MTCAN_SPEOD | MTCAN_SEEK | MTCAN_APPEND | MTCAN_SILI | MTCAN_VAR | MTCAN_SETSZ | MTCAN_CHTYPEANY | MTCAN_COMPRESS, 20, 8*60, 10*60, 3*60, 3*60, 16384, 256*1024, tpsc_default_dens_count, tpsc_default_hwg_dens_names, tpsc_default_alias_dens_names, {0}, 0, 0, 0, 0, (u_char *)0 }, |
If the device does have this capability specified and you are using the tpsc tape driver, you must do the following:
Remove the device from the list
Perform an autoconfig
Reboot in order for the plug-in to be able to force the dismounts of tapes from devices of that type
If you are using the ts tape driver, then the /etc/config/tspd.config file must not specify the following for the device:
PREVENT_REMOVAL pathname yes |
If the device does have this capability specified, you must edit the tspd.config file and restart the appropriate ts personality daemon.
Some of the functions of the FailSafe TMF plug-in are performed through TMF; the plug-in issues commands to the TMF daemon to use these functions. However, the plug-in forces a dismount of a tape from a drive by issuing a command to the library software controlling the loader/library. In the case of the Storage Technology Corporation (STK) hardware, the plug-in communicates its request to the Automated Cartridge System Library Software (ACSLS) software that controls the loader. The plug-in uses an expect script that issues commands to login to the loader and issue a dismount request to a drive.
The /etc/tmf/failsafe_tmf.config file lets you configure additional features of the FailSafe TMF plug-in. This file exists on all hosts in the cluster, and should be edited as necessary on each machine.
The contents of the failsafe_tmf.config file are dependent on the drives assigned to each host in the cluster. If all hosts in the failover domain are configured through TMF to use exactly the same drives, then this file would be the same on each host in the failover domain. You must maintain this file on each host; a change on one host is unknown to the other hosts.
There are two different types of directives that you can specify in the failsafe_tmf.config file: the loader directive and the remote_devices directive. These are defined in the following subsections.
The loader directive provides information about a TMF loader, which controls one or more tape devices that are members of TMF device groups being managed as FailSafe resources. There may be more than one such directive in this file. The loader information is used by the FailSafe TMF plug-in to force a dismount of tapes from drives that cannot be made available (that is, have tmstat states other than assn, free, conn, or idle) so that those tapes can be used via other tape devices in the same device group. The information is also used to force a dismount of tapes from drives that are only connected to other hosts, not this host (as described in “The remote_devices Directive ”). If the file does not contain a loader directive, then the TMF plug-in will make no attempt to force a dismount of tapes from any drives.
The directive has the following format:
loader lname ltype lhost luser lpswd |
where:
lname | Name of the loader as defined in the TMF config file. |
ltype | Type of the loader as defined in the TMF config file. (Currently only STKACS is supported for ltype.) |
lhost | Server name of the loader as defined in the TMF config file. |
luser | User name of the loader's administrator account. For STKACS, use acssa. |
lpswd | Password for the loader's administrator account. |
The TMF command /usr/sbin/tmmls shows the name of the loader and the server associated with it:
# tmmls loader type status m server old m_pnd d_pnd r_qd comp avg operator OPERATOR UP A IRIX 0 0 0 0 0 0(sec) wolfy STKACS DOWN A wolfcree 0 0 0 0 0 0(sec) panther STKACS DOWN A stk9710 0 0 0 0 0 0(sec) l180 STKACS UP A stk9710 0 0 0 0 0 0(sec) |
For example, suppose you want to have the FailSafe TMF plug-in dismount drives that are in the l180 loader/library listed above. That library has the stk9710 server associated with it. The loader directive in the failsafe_tmf.config file would look like the following:
loader l180 STKACS stk9710 acssa acssapassword |
In this case, the FailSafe TMF plug-in would force a dismount for each drive that is specified in the tmf.config file to be in the l180 loader/library and in the plug-in's drive group. If you do not want the plug-in to dismount any tape drives associated with a particular resource, you would not place a loader directive in the failsafe_tmf.config file.
The remote_devices directive provides information about one or more tape devices that are part of a TMF device group, but which are not visible on this host. An example would be where a library has four SCSI drives, and two drives are connected to each of two FailSafe hosts. If host A should crash, host B must be able to force a dismount of any tapes in A's drives so that they can then be used from host B. Because the drives are not visible on host B, the remote_devices directive provides the information needed to force a dismount of unseen drives.
The directive has the following format:
remote_devices rname lname drvid ... |
where:
rname | Name of the FailSafe TMF Resource that will dismount this drive. | ||
lname | Name of the loader as defined in the TMF config file. There must be a loader directive for lname elsewhere in this file, or the remote_devices directive will be ignored. | ||
drvid | The vendor ID of the drive on which to force a dismount. This is the unique name by which the loader identifies the drive. In the case of STKACS, this will be a comma-separated four-digit string listing the ACS, LSM, drive panel, and drive (for example, 0,0,1,3).
|
Multiple vendor IDs can be specified in the same remote_devices directive as long as they all pertain to the same loader. If all the vendor IDs will not fit on a single line, just add additional remote_devices directives for the same loader. For example, to enable the FailSafe TMF plug-in to force a dismount of the remote drives (0,0,1,0), (0,0,1,1), (0,0,1,2), and (0,0,1,3) in the l180 loader/library for resource tmf_eglf, the directive would be:
remote_devices tmf_eglfl 180 0,0,1,0 0,0,1,1 0,0,1,2 0,0,1,3 |
If multiple FailSafe TMF resources are defined, only the resource named tmf_eglf will force a dismount of these drives.
If drives that belong to a FailSafe TMF resource are configured on more than one machine in the FailSafe cluster, they should be configured consistently. The same tape driver (for example, ts or tpsc) should be used on each host where the drive is configured.
When configuring a FailSafe TMF resource, administrators should be aware of several parameters in the /etc/tmf/tmf.config file. The FailSafe TMF plug-in will try to start the loader associated with its device-group if it is not up. However, if the tmf.config file specifies status = UP for the loader, this step may not be necessary, and the devices may become available sooner.
A drive that is in a FailSafe TMF resource will be configured in the tmf.config file of one or more hosts within the cluster. It should be configured with status=down. All drives associated with the resource group must be unavailable for the exclusive script to indicate that the resource is not already running.
If the drives being used do not support persistent reserve, then they should be configured in the tmf.config file with access=shared. If the drives do support persistent reserve, then it is recommended that you use this feature when using the FailSafe TMF plug-in. To use persistent reserve, you should use the ts tape driver, and set access=exclusive in the tmf.config file. See the ts(7) man page for more information about using the ts tape driver. The access option should be consistent across all hosts in the failsafe cluster where the drives are configured.
The -g option of the tmconfig command reassigns a device to a different device group name. The FailSafe TMF software does not support reassigning a device into a FailSafe TMF device group. That is because, in case of failover, the FailSafe TMF plug-in on the machine we have failed over to would not have any knowledge of this reassigned drive. It would not be able to dismount tapes that are in the drive. Using tmconfig -g to move devices out of a FailSafe TMF device group will decrease the number of available drives that the monitor script sees. Also, in the case of failover or stop, the drive will be configured down.
The FailSafe TMF plug-in assumes that TMF is being used as the mounting service for tape devices associated with a tape library. Each time the plug-in is run, it will verify that TMF is up and running. If TMF is not running, the plug-in will start it. If TMF cannot be started by the plug-in, a failure will occur. Next, the plug-in will verify that the tape loader associated with the devices for a resource is up and accessible. If it is not up, the plug-in will configure it up using the /usr/sbin/tmconfig TMF command. If it cannot configure up the loader, a failure will occur.
The FailSafe TMF plug-in uses information supplied by the /etc/tmf/tmf.config file to identify what devices pertain to a particular resource. It uses that information in conjunction with the resource's remote_devices directive in the failsafe_tmf.config file to determine what actions need to be taken on tape drives defined by the resource.
The plug-in retrieves the values of the device-group and devices-minimum attributes for a particular resource. It then examines the TMF configuration file for information pertaining to drives belonging to the same device group as specified for the resource and stores the information for processing.
It will then force a dismount of tapes from any drives that are specified in the remote_devices directive in the failsafe_tmf.config file associated with the resource. The FailSafe TMF plug-in verifies the minimum number of devices of the specified type are available for use. A device is considered available if its status displayed from the tmstat command is one of the following:
assn
idle
free
If devices are in the down state, it will use tmconfig to configure them up and make sure that they are available. If a device cannot be configured up, and the associated loader directive is in the failsafe_tmf.config file, the plug-in will force a dismount of the tape from that device.
If the FailSafe TMF plug-in does not find the required minimum number of drives to be available, a failure will occur.
After you have defined the resource type, you must define the TMF resources based on the resource type. Each resource requires a unique resource name.
# cmgr Welcome to SGI Cluster Manager Command-Line Interface cmgr> set cluster eagan cmgr> define resource egft of resource_type TMF Enter commands, when finished enter either "done" or "cancel" Type specific attributes to create with set command: Type Specific Attributes - 1: device-group Type Specific Attributes - 2: devices-minimum Type Specific Attributes - 3: devices-loaned Type Specific Attributes - 4: email-addresses No resource type dependencies to add resource egft ? set device-group EGLFT resource egft ? set devices-minimum 4 resource egft ? set email-addresses [email protected] resource egft ? done Successfully defined resource egft |
![]() | Note: The devices-loaned parameter is ignored and it should be left at its default value. |
The device-group field is case sensitive. It must exactly match what is entered in the tmf.config file.
FailSafe, and the FailSafe TMF plug-in, allow you to specify unique values for each of the attributes on each of the hosts. For example, a FailSafe TMF resource could be defined so that host A had devices-minimum=3 , but host B had devices-minimum=2.
You can create a resource group by using either the FailSafe GUI or the cmgr command. For information, see the FailSafe Administrator's Guide for SGI InfiniteStorage.
To define an effective resource group, you must include all of the resources on which the TMF depends. The following example shows the creation of a typical resource group:
cmgr> show failover_policies Failover Policies: t2 dmfadmin dgroups ordered cmgr> define resource_group tmfrg in cluster eagan Enter commands, when finished enter either "done" or "cancel" resource_group tmfrg ? set failover_policy to dgroups resource_group tmfrg ? add resource egft of resource_type TMF resource_group tmfrg ? done Successfully created resource group tmfrg cmgr> show resource_group tmfrg Resource Group: tmfrg Cluster: eagan Failover Policy: dgroups Resources: egft (type: TMF) cmgr> show failover_policy dgroups Failover policy: dgroups Version: 1 Script: ordered Attributes: Auto_Failback Auto_Recovery Initial AFD: guiness dublin |
To ensure that the TMF resource has been correctly configured, you can test individual actions by executing the scripts. Each script, located at /var/cluster/ha/resource_types/TMF, requires two arguments, an input file and an output file. The contents of these files are the resource names. The scripts will display 0 if they are successfully executed or display a positive number that indicates the error type. For more information on error codes, see the FailSafe Programmer's Guide for SGI Infinite Storage.
All TMF scripts assume they are being run under ksh.
In the following example, you can test the start script by starting the NFS resource with the resource name tmfx.
$ cd /var/cluster/ha/resource_types/TMF $ echo "tmfx" > /tmp/ipfile $ ./start /tmp/ipfile /tmp/opfile |
This should start the tmfx instance, named by the TMF resource tmfx.
To view the individual script actions, you must edit the script and add the following to the action function:
set -x |
Use the following procedure to test the start script:
Create a TMF resource for a device group specified in the tmf.config file. Do not run TMF on your test node.
Perform the following actions:
# echo "resource-name" > /tmp/ipfile # /var/cluster/ha/resource_types/TMF/start /tmp/ipfile /dev/null |
Check /var/cluster/ha/log/script_ nodename logfile and verify from the log messages that the resource was started.
Verify that the TMF daemon process tmdaemon was started. Using the TMF commands tmmls and tmstat, verify that the tape loader and tape drives of type resource_name that you defined came up and are available.
![]() | Note: Running this test may force the dismount of tapes in derives as specified in the failsafe_tmf.config file. |
To stop the TMF resource, enter the following command:
# echo "resource-name" > /tmp/ipfile # /var/cluster/ha/resource_types/TMF/stop /tmp/ipfile /dev/null |
Check to see if the resource is offline by using the TMF tmstat command to verify that the tape devices of type resource_name were configured down.
You can test the failover policy by using either cmgr or the FailSafe GUI to move the resource group to another node in the cluster. To ensure that the resource group correctly failed over, use cmgr or the GUI to display the resource group states.
The following example uses cmgr to test the failover policy:
cmgr> admin offline resource_group TMF in cluster eagan cmgr> admin move resource_group TMF in cluster eagan to node cm2 cmgr> admin online resource_group TMF in cluster eagan |