This guide describes the configuration and administration of a FailSafe highly available system.
This guide was prepared in conjunction with the IRIS FailSafe 2.1.6 release and IRIX 6.5.27.
This guide is written for the person who administers the FailSafe system. The FailSafe administrator must be familiar with the operation of the SGI server, storage systems, XFS, and XLV or XVM.
To use Performance Co-Pilot for FailSafe, you must have the following licenses:
Two or more Performance Co-Pilot Collector licenses ( PCPCOL), one for each node in the FailSafe cluster from which you want to collect performance metrics.
One Performance Co-Pilot Monitor license (PCPMON ) for the workstation that is to run the visualization tools.
FailSafe configuration and administration information is presented in the following chapters and appendices:
Chapter 1, “Overview”, introduces the components of the FailSafe system and explains its hardware and software architecture.
Chapter 2, “Configuration Planning”, describes how to plan the configuration of a FailSafe cluster.
Chapter 4, “FailSafe Installation and System Preparation ” describes several procedures that must be performed on nodes in a cluster to prepare them for FailSafe.
Chapter 5, “Administration Tools”, provides an overview of the FailSafe Manager GUI and the cmgr command.
Chapter 6, “Configuration”, explains how to configure a FailSafe system.
Chapter 7, “Configuration Examples”, shows an example of a FailSafe three-node configuration and some variations on that configuration.
Chapter 8, “FailSafe System Operation”, explains how to operate and monitor a FailSafe system.
Chapter 9, “Testing the Configuration”, describes how to test the configured FailSafe system.
Chapter 10, “System Recovery and Troubleshooting”, describes the log files used by FailSafe and recovery procedures.
Chapter 11, “Upgrading and Maintaining Active Clusters”, describes some procedures you may need to perform without shutting down a FailSafe cluster.
Chapter 12, “Performance Co-Pilot for FailSafe”, tells you how to use Performance Co-Pilot to monitor the availability of a FailSafe cluster.
Appendix A, “FailSafe Software”, summarizes the systems to install on each component of a cluster.
Appendix B, “Metrics Exported by Performance Co-Pilot for FailSafe”, lists the metrics implemented by pmdafsafe.
The following documentation will be useful in a FailSafe environment:
FailSafe Programmer's Guide for SGI Infinite Storage
FailSafe Architecture for SGI InfiniteStorage
Migrating from IRIS FailSafe 1.2 to IRIS FailSafe 2.1.X
Performance Co-Pilot for IRIX Advanced User's and Administrator's Guide
CXFS Administration Guide for SGI InfiniteStorage
FailSafe DMF Administrator's Guide for SGI InfiniteStorage
IRIS FailSafe 2.0 INFORMIX Administrator's Guide
IRIS FailSafe 2.0 Netscape Server Administrator's Guide
IRIX FailSafe NFS Administrator's Guide
IRIS FailSafe 2.0 Oracle Administrator's Guide
IRIS FailSafe Version 2 Samba Administrator's Guide
FailSafe Version 2 TMF Administrator's Guide
Embedded Support Partner User Guide
Personal System Administration Guide
Network Load Balancing Software Administrator's Guide
The FailSafe man pages are as follows:
Release notes are included with each FailSafe product. The names of the release notes are as follows:
Cluster administration services
Node control services
IRIS FailSafe 2.1. x
IRIS FailSafe for DMF
IRIS FailSafe for INFORMIX
IRIS FailSafe for NFS
IRIS FailSafe for Oracle
IRIS FailSafe for Samba
IRIS FailSafe for TMF
You can obtain SGI documentation as follows:
See the SGI Technical Publications Library at http://docs.sgi.com. Various formats are available. This library contains the most recent and most comprehensive set of online books, release notes, man pages, and other information.
If it is installed on your SGI system, you can use InfoSearch, an online tool that provides a more limited set of online books, release notes, and man pages. With an IRIX system, enter infosearch at a command line or select Help -> InfoSearch from the Toolchest.
You can view release notes by entering either grelnotes or relnotes at a command line.
You can view man pages by typing man title at a command line.
The following conventions are used throughout this document:
This fixed-space font denotes literal items such as commands, files, routines, path names, signals, messages, and programming language structures.
Man page section identifiers appear in parentheses after man page names. (1) indicates a user command, (1M) and (8) indicate an administrator command
Italic typeface denotes variable entries and words or concepts being defined.
This font denotes the names of graphical user interface (GUI) elements such as windows, screens, dialog boxes, menus, toolbars, icons, buttons, boxes, fields, and lists.
This bold, fixed-space font denotes literal items that the user enters in interactive sessions. (Output is shown in nonbold, fixed-space font.)
Brackets enclose optional portions of a command or directive line.
Ellipses indicate that a preceding element can be repeated.
If you have comments about the technical accuracy, content, or organization of this publication, contact SGI. Be sure to include the title and document number of the publication with your comments. (Online, the document number is located in the front matter of the publication. In printed publications, the document number is located at the bottom of each page.)
You can contact SGI in any of the following ways:
Send e-mail to the following address:
Use the Feedback option on the Technical Publications Library Web page:
Contact your customer service representative and ask that an incident be filed in the SGI incident tracking system.
Send mail to the following address:
|1500 Crittenden Lane, M/S 535|
|Mountain View, California 94043-1351|
SGI values your comments and will respond to them promptly.