Chapter 6. Example: Requiring Confirmation Before Failover

Suppose you wanted to require that the system operator approve a failover before it takes place for an application that may not always recover automatically after a failure.

You could do this by creating a dummy resource type with the lowest order number possible: 1. This resource would be written to prompt the operator for confirmation before continuing with the failover. The operator would be prompted on any node in the cluster, depending upon where the resource is running. (The operator would also be prompted when the resource group is started for the first time, or when HA services are started after a reboot.)

A start script can wait indefinitely if you set its timeout value to a large-enough value. (The timeout value is specified in milliseconds.) If the start timeout expires, the resource group might go into error state. Therefore, the start script should wait for confirmation from the user only for certain amount of time. If the response is not received from the user, the start script can continue the failover or fail by writing HA_CMD_FAILED in the output file. If the start script fails, the user must manually recover the resource group.