The IRIS FailSafe software uses redundant servers to provide a high-availability system. The servers run Video Server Toolkit (VST) and communicate using a special serial connection as well as a private ethernet; they may share clips and index filesystems.
When FailSafe detects a failure on the primary server, IRIS FailSafe transfers the filesystems and processing functions to the secondary server.
To use FailSafe, make sure the subsystems, vst_eoe.sw.fsmon and vst_eoe.sw.failsafe, are installed.
IRIS FailSafe is discussed in detail in the IRIS FailSafe Administrator's Guide and the IRIS FailSafe Programmer's Guide .
IRIS FailSafe is a software product that enables a pair of servers to be used in a redundant configuration. The servers are configured with the VST software and share the VST filesystems. The dual-server IRIS FailSafe configuration shown in Figure 10-1 provides redundancy. If the primary server fails for any reason, the secondary server mounts any shared filesystems with the clips and their index files. Then, using the local VST, the second server is ready to play the clips.
In an IRIS FailSafe configuration, the operating system on each server is configured for the VST. The disks on each server store the operating system and the shared RAIDs store the clips and index filesystems. The RAID is shared by physically attaching it to an Emulex LH5000 Digital Fibre hub and by connecting the servers to two other ports in this hub. The two servers along with the shared RAID(s) form an IRIS FailSafe cluster.
The servers also share a public IP alias or a name users can use to connect to the servers. If a server fails, the backup server takes over the shared RAIDs as well as this IP alias, which users still use to connect to the servers. To the user, a failover looks the same as a server that crashed and got rebooted very quickly.
The servers use a special serial connection to communicate. When the backup server detects a problem with the active server, IRIS FailSafe unmounts the filesystems from the active server and automounts them on the secondary server. VST on the secondary server detects this action and adds the clips to its tables so that it is ready to play them.
The primary and secondary servers can share up to four Ciprico 7000 RAID systems using an Emulex LH5000 digital Fibre Hub, as shown in Figure 10-1. They also share an IP alias which always points to the currently-active server.
Failures occur if one of the following events are detected on the currently-active server:
Power failure
Public network card failure
Operating system crash
Unmounting of shared filesystems
When a failure is detected, the secondary server attempts to reboot the primary server using the serial cable. The secondary server also takes over the shared services, the shared RAIDS, and the IP alias. VST detects the newly-mounted filesystem using fsmon and loads the clips.
In an IRIS FailSafe configuration, the primary and secondary servers use the states shown in Table 10-1.
Table 10-1. Primary and Secondary Server States
Event | Primary Server Status / VST Status | Secondary Server Status / VST Status | Owner of Shared Services |
---|---|---|---|
Normal operation | Normal Running | Normal Running | Primary |
Power off primary server | Cannot be determined Not running | Degraded Running | Secondary |
Power on primary server | Controlled-failback Running | Degraded Running | Secondary |
ha_admin -rf (after power off on primary server, ha_admin executes on primary server) | Normal Running | Normal Running | Primary |
Power off secondary server (before execution of ha_admin -rf) | Degraded Running | Cannot be determined Not running | Primary |
Power on secondary server | Degraded Running | Controlled-failback Running | Primary |
ha_admin -rf (after power off on secondary server, ha_admin executes on secondary server) | Normal Running | Normal Running | Primary |
Unmount shared filesystem (Primary) | Standby Running | Degraded Running | Secondary |
Unmount shared filesystem (Secondary) | Standby Running | Degraded Running | Neither |
ha_admin -rf (after unmounting, ha_admin executes on primary server) | Normal Running | Normal Running | Primary |
Disconnect serial connection | Normal running | Normal Running | Primary |
ha_admin -m start backup | Normal Running | Normal Running | Primary |
ha_admin -fs primary (executed on secondary server) | Standby Running | Degraded Running | Secondary |
ha_admin -rf primary after ha_admin -fs primary completed executing | Normal Running | Normal Running | Primary |
The server availability status can be checked using the ha_admin -a command.
The status of the shared filesystem can be checked with the df command from either server.
The status of the proper aliasing of the servers and the network can be checked using the netstat -i command.
Use the IRIS FailSafe command-line interface to manage the Video Server Toolkit IRIS FailSafe system. For details, see the IRIS FailSafe Administrator's Guide .