Chapter 3. Script Changes for Programmers

This chapter provides guidelines for migrating your 1.2 resources and monitor script information to 2.1. x action scripts. It covers the following:


Caution: Multiple instances of 2.1.x action scripts may be executed at the same time. To avoid this, you can use the ha_execute_lock command.

The software for 2.1.x and 1.2 can coexist in the same node. However, 2.1.x and 1.2 cannot run at the same time.

There is no configuration checksum verification in scripts.


Resource Types

In 2.1.x, the ha.conf configuration file has been replaced by the cluster database. The cluster database is automatically copied to all FailSafe nodes in the pool. See the IRIS FailSafe Version 2 Administrator's Guide for information about configuring a 2.1 x system.

If you require new resource types, you will create them using either the FailSafe Manager GUI or the cmgr command. See the IRIS FailSafe Version 2 Administrator's Guide.

You may be able to reuse the following monitoring information from the 1.2 ha.conf file with regard to 2.1. x resource types:

  • start-monitor-time 

  • lmon-probe-time (equivalent in 2.1 to the monitor script's interval parameter)

  • lmon-timeout 


    Note: All 2.1.x time-outs are in milliseconds.


The following examples show information (in bold) that is used in the 1.2 ha.conf file and reused when creating a new resource type in 2.1.x.

Suppose a portion of the 1.2 ha.conf file had the following:

action apache
{
        local-monitor = /var/ha/actions/ha_apache_lmon
}
 
action-timer apache
{
        start-monitor-time = 120
        lmon-probe-time = 120 
        lmon-timeout = 60
}

You would reuse the information when creating a resource type in 2.1x, as follows:

cmgr> create resource_type apache in cluster apache-cluster


(Enter "cancel" at any time to abort)

Node[optional] ? 
Order ? 500
Restart Mode ? (0) 


DEFINE RESOURCE TYPE OPTIONS

        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option: 1

No current resource type actions

Action name ? start
Executable timeout (in milliseconds) ? 20000
        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option:1

Current resource type actions:
        start

Action name ? stop
Executable timeout (in milliseconds) ? 20000

        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option: 1
Current resource type actions:
        start
        stop

Action name ? monitor
Executable timeout (in milliseconds) ? 60000
Monitoring Interval (in milliseconds) ? 120000
Start Monitoring Time (in milliseconds) ? 120000

        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option:1

Current resource type actions:
        start
        stop
        monitor

Action name ? exclusive
Executable timeout (in milliseconds) ? 60000

        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option:3

No current type specific attributes

Type Specific Attribute ? search-string
Datatype ? string
Default value[optional] ? httpd

        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option:5

No current resource type dependencies

Dependency name ? IP_address


        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option:7

Current resource type actions:
        Action - 1: start
        Action - 2: stop
        Action - 3: monitor
        Action - 4: exclusive

Current type specific attributes:
        Type Specific Attribute - 1: search-string

No current resource type dependencies

Resource dependencies to be added:
        Resource dependency - 1: IP_address

        0) Modify Action Script.
        1) Add Action Script.
        2) Remove Action Script.
        3) Add Type Specific Attribute.
        4) Remove Type Specific Attribute.
        5) Add Dependency.
        6) Remove Dependency.
        7) Show Current Information.
        8) Cancel. (Aborts command)
        9) Done. (Exits and runs command)

Enter option:9
Successfully defined resource_type apache

Reading Information

In 2.1x, configuration information is read using the ha_get_info() and ha_get_field() shell functions. These functions are equivalent to the 1.2 ha_cfginfo command.

In 2.1x, all common functions and variables are kept in the following file:

/var/cluster/ha/common_scripts/scriptlib

This file is equivalent to the following 1.2 file:

/var/ha/actions/common.vars

For more information, see the IRIS FailSafe Version 2 Administrator's Guide.

Parameter Parsing

In 2.1 x, action script parameters are passed in a file and information is also returned in a file. The script takes a list of resource names as parameters.

Action Scripts

Table 3-1, summarizes the differences in scripts between the releases.

Table 3-1. Differences between 1.2 and 2.1. x Scripts

FailSafe 1.2

FailSafe 2.1. x

giveaway, giveback

stop

takeover, takeback

start

check

monitor

(no equivalent)

exclusive, restart

In 2.1.x, the action scripts are installed in the following directory, where Resource_Type_Name is the name of the resource type (such as NFS and Action_Name is the name of the action script (such as start):

/var/cluster/ha/Resource_Type_Name/Action_Name

For example, the start action script for the NFS resource type would be located in the following directory:

/var/cluster/ha/NFS/start

Templates of the action scripts (start, stop, monitor, exclusive,  restart) are provided in the following directory:

/var/cluster/ha/resource_types/template

For more information about action scripts, see the IRIS FailSafe Version 2 Programmer's Guide.

The following sections provide example portions of 1.2 scripts and their 2.1.x equivalents:

  • giveback and stop 

  • takeover and start 

  • monitor and monitor 


    Note: There are no 1.2 equivalents for the 2.1.x exclusive and restart scripts.


In the following examples, only the relevant portions of the scripts are shown. Areas in common between 1.2 and 2.1.x are shown in bold.

1.2 giveback / 2.1.x stop

For example, suppose you had the following in the giveback script in 1.2:

giveback()
{
    for i in `$CFG_INFO ${T_APACHE}`
    do
        SEARCH="$CFG_INFO ${T_APACHE}${CFG_SEP}${i}${CFG_SEP}${T_BACKUP}"
        BACKUP=`$SEARCH`
        if [ $? -eq 1 ]; then
            ${LOGGER} "$0: Trouble finding backup-node for apache ($SEARCH)"
            exit $INCORRECT_CONF_FILE;
        fi
        # If I am the backup
        if [ ${BACKUP} = ${HOST} ]; then
            ${LOGGER} "$0: Stopping apache for backup server."
            killall -9 /apache-fs/usr/local/apache_1.2.0/src/httpd
            if [ $? -ne "0" ]; then
                ${LOGGER} "$0: halt of apache on backup server failed."
            fi
        fi
 
        exit $SUCCESS
    done
}

In 2.1.x, you would have the following in the stop script:

stop_apache()
{
    for server in $HA_RES_NAMES
    do
        ${HA_DBGLOG} "Stopping apache server $server"
        killall -9 /apache-fs/usr/local/apache_1.2.0/src/httpd
        if [ $? -ne "0" ]; then
            ${HA_LOG} "halt of apache server $server failed."
            ha_write_status_for_resource $server $HA_CMD_FAILED;
        else
            ${HA_DBGLOG} "halt of apache server $server successful"
            ha_write_status_for_resource $server $HA_SUCCESS;
        fi
    done
}

1.2 takeover / 2.1.x start

For example, suppose you had the following in the takeover script in 1.2:

takeover()
{
    for i in `$CFG_INFO ${T_APACHE}`
    do
        SEARCH="$CFG_INFO ${T_APACHE}${CFG_SEP}${i}${CFG_SEP}${T_BACKUP}"
        BACKUP=`$SEARCH`
        if [ $? -eq 1 ]; then
            ${LOGGER} "$0: Trouble finding backup-node for apache ($SEARCH)"
            exit $INCORRECT_CONF_FILE;
        fi
        # If I am the backup
        if [ ${BACKUP} = ${HOST} ]; then
            ${LOGGER} "$0: Starting apache for backup server."
            /apache-fs/usr/local/apache_1.2.0/src/httpd -d \
/apache-fs/usr/local/apache_1.2.0
           if [ $? -ne "0" ]; then
                ${LOGGER} "$0: start of apache on backup server failed."
                exit $FAILED
            fi
        fi
        exit $SUCCESS
    done
}

In 2.1.x, you would have the following in the start script:

start_apache()
{
    for server in $HA_RES_NAMES
    do
        ${HA_DBGLOG} "Starting apache server $server"
        /apache-fs/usr/local/apache_1.2.0/src/httpd -d \
/apache-fs/usr/local/apache_1.2.0
        if [ $? -ne "0" ]; then
            ${HA_LOG} "start of apache server $server failed."
            ha_write_status_for_resource $server $HA_CMD_FAILED;
        else
            ${HA_DBGLOG} "start of apache server $server successful"
            ha_write_status_for_resource $server $HA_SUCCESS;
        fi
    done
}

1.2 monitor/ 2.1.x monitor

For example, suppose you had the following in the monitor script in 1.2:

monitor()
{
 
    # Read the search string entry
    for i in `$CFG_INFO ${T_APACHE}`
    do
        SEARCH="$CFG_INFO ${T_APACHE}${CFG_SEP}${i}${CFG_SEP}${T_SEARCH_STR}"
        SEARCH_STR=`$SEARCH`
        ${SEARCH_STR:=httpd};
    done
 
    EXEC="${KILLALL} -0 ${SEARCH_STR}";
    execute_cmd "check if apache server processes are running"
 
}

In 2.1.x, you would have the following in the monitor script:

monitor_apache()
{
    for server in $HA_RES_NAMES
    do
        get_apache_info $server
        if [ $? -eq 0 ]; then
            APACHE_FIELDS=${HA_STRING
            ha_get_field "${APACHE_FIELDS}" search-string;
            if [ $? -eq 0 ]; then
                SEARCH_STR=${HA_FIELD_VALUE};
            fi
        fi
        ${SEARCH_STR:=httpd};
        HA_CMD=${KILLALL} -0 ${SEARCH_STR}";
        ha_execute_cmd "check if server $server processes are running"
        if [ $? -ne 0 ]; then
            ${HA_LOG} "monitor of apache server $server failed."
            ha_write_status_for_resource $server $HA_CMD_FAILED;
        else
            ${HA_DBGLOG} "monitor of apache server $server successful"
            ha_write_status_for_resource $server $HA_SUCCESS;
        fi
    done
}

Ordering Script Actions

In 2.1. x, each resource type has a start/stop order, which is a positive integer. In a resource group, the start/stop orders of the component resource types determine the order in which the resources will be started when FailSafe brings the group online and will be stopped when FailSafe takes the group offline. The group's resources are started in increasing order, and stopped in decreasing order.


Note: Resources of the same type are started and stopped in indeterminate order.

For example, if resource type volume has order 10 and resource type filesystem has order 20, then when FailSafe brings a resource group online, all volume resources in the group will be started before all file system resources in the group.

There is no need to create software links similar to those used in 1.2.