Chapter 7. NLB Administration

The Network Load Balancer (NLB) is comprised of servers, collectors, and clients. The NQE GUI Status function uses collectors that communicate with the NLB server to obtain batch request status information.

This chapter describes NLB administration. The following topics are discussed:

The NQE User's Guide, publication SG-2148, describes the NQE GUI Load window, which allows you to monitor machine load and system usage across the complex.

For information about the NQE database and NQE scheduler, see Chapter 9, “NQE Database”.

NLB Overview

The Network Load Balancer (NLB) server contains the NLB database for NQE information. It stores this information in the form of objects. A policy is the mechanism by which the NLB server selects destinations based on the information in the NLB database. (Policies are described in “Defining Load-balancing Policies” in Chapter 8.)

Each object in the NLB database is uniquely identified by a combination of its type (the definition of the object) and its name. All objects of a specific type have unique names to differentiate them; these object names are case insensitive. The following example describes one object in the NLB database:

object NLB (host1)

The object type is NLB and the name of the object is host1.

Each object contains a set of attributes. An attribute consists of unique attribute names, which identify it and its corresponding attribute value. Attribute names are defined when the object type is defined; attribute names are case insensitive. The object type and attribute names are defined in the name map.

The following is an example of an object with one of its attributes:

object NLB (host1) {
      NLB_HARDWARE = "Sun4c"
}

The NLB object named host1 contains an attribute called NLB_HARDWARE, which is set to Sun4c. For a complete list of attributes for an object type, use the following command:

nlbconfig -mdump -t object_type

Although the NLB database can store any defined object, within NQE only the following object types are defined and used:

Object type 

Description

NLB 

Data sent by the collector processes containing information about the NQE server that may be used for destination selection. Attributes of this object include such things as CPU load.

NJS and NJS_ALIVE 

Data sent by collector processes describing NQS requests on the NQE server. This is used by the NQE GUI Status function. NJS_ALIVE tells you the last time the NLB hear from a collector; it is used by the NQE GUI Load function to indicate an NQE server is no longer talking to the NLB. It may also be used for destination selection.

FTA_CONFIG and FTA_DOMAIN 

FTA configuration objects. For information on defining these objects, see “Modifying the NLB Database for NPPA” in Chapter 13.

NLB_JOB 

Data sent to the NLB about the request, which the NLB uses to choose a destination for the request.

C_OBJ 

Configuration for the NLB itself.

Overview of NLB Configuration

The NLB has the following configuration requirements:

  • The NLB server can be run from any account unless it is configured to use a privileged TCP/IP port number.

  • The server requires a working directory and free space for data storage. This directory holds configuration files used by the server; it also stores a permanent copy of the NLB database. To run 750 requests, 250 Kbytes should be ample space.

  • There are two configuration files, as follows:

    • config, which defines some constants of operation and the master ACL used to control administrative access to the server. It is read at startup time.

    • policies, which contains the definitions of policies. This file also is read at startup, but the servers reads it again when the administrator issues the nlbconfig -pol command. This file is described in “The policies File” in Chapter 8. NQS is shipped with a policy named nqs, which selects batch queues based on which system has the highest average idle CPU load.

The location of these files is defined in the NLB_SERVER_DIR attribute of the nqeinfo file.

Server Database Integrity

The NLB server database is held in memory, but it is periodically backed up to disk if it has been modified. The update interval is configurable; the default is 30 seconds. See the SKULK field in the NLB configuration file, described in the following section. The NLB database also is written out to disk on server shutdown. When the server restarts, it reads the NLB database from disk.

Access control lists (ACLs) are read and written in the same manner as the NLB server database. (For a description of ACLs, see “Configuring Name Maps, Objects, and ACLs”.)

The restart sequence, in combination with collectors, provides a high degree of consistency between NLB database contents and the information sent to the server.

Editing the NLB Configuration File

The NLB configuration file sets the configurations that cannot be changed while the NLB server is running. To change these settings, edit the NLB configuration file and then restart the NLB.

The NLB configuration file, $NLB_SERVER_DIR/config (where NLB_SERVER_DIR is defined in the nqeinfo file), is an ASCII text file that has the following format:

KEY:value

The KEY field defines the parameter being controlled; value gives the setting of the parameter. The interpretation of value depends on the KEY. Any text after a # symbol is a comment; spaces or tabs can be used anywhere in the file. There are defaults for all fields.

KEY

Description of value

OBJECTS:

Number of different object types the server will allow. The default is 32.

PORT:

TCP/IP port number or service name on which the server should listen for connections. The default is nlb.

SKULK:

The interval (in seconds) between NLB database write operations to disk. The server checks for NLB database modifications every value seconds, and it writes out those object lists or ACLs that have changes. The default is 30 seconds.

M_ACL:

The record to add to the master ACL. This determines who is allowed access to the ACL objects.

This value contains three fields: the user name, the host name, and the privileges to be granted. The host name can be used to allow the user access from any location.

If no M_ACL record is present, the ACL defaults to the user who started the server on the same host as the server.

D_ACL:

Record to add to the default ACL for a new object. This value has the same format as the M_ACL record. If it is not specified, there is no default ACL, and any user is allowed access to any object except ACL objects. Read access is allowed for the ACL for the C_OBJ object type, but no access is allowed for any other ACLs.

DBASE:

Path name of the directory used for object storage. You can override it by using the nlbserver -d option.

OBJ:

Controls the storage class for an object, which you can configure by using the nlbconfig command.

Two fields are required: the object ID and the storage class. Storage class is one of the following:

persistent

Objects are written to disk at the rate controlled by SKULK.

permanent

Objects are written to disk when the server is shut down.

transient

Objects are never written to disk.

The following is an example of a configuration file:

#   NLB server configuration file - master and default ACLs
#
#   OBJECTS:        - defines maximum number of different object types
#   SKULK:          - interval in seconds between database update
#   PORT:           - TCP service name or port number to listen on
#   M_ACL:          - record for master ACL (can be repeated)
#   D_ACL:          - record for default ACL (can be repeated)
#
#   If M_ACL is not defined then it defaults to the user who
#   started the server on the same machine. The master ACL
#   controls the editing of other ACLs.
#
#   If D_ACL is not defined the default ACL is empty which
#   means objects are not restricted. If there is an ACL for an
#   individual object type it overrides the default ACL
#

OBJECTS:        32                # number of different objects
PORT:           nlb               # tcp service name
SKULK:          30

OBJ:            500     permanent
OBJ:            501     permanent

#               User    Host            Privs
#               ----    ----            -----

M_ACL:          root    *               all
M_ACL:          fred    *               all

Configuring ACLs for the NLB

Access to data in the NLB is controlled through Access Control Lists (ACLs). Each object type may have its own ACL. If no ACL is defined for an object type, then the default ACL (D_ACL) is used for objects of that object type. If there is no ACL for an object type and there is not an ACL, then access to any object of that object type is unrestricted. The ability to access the ACLs is controlled by the master ACL (M_ACL).

The M_ACL and the D_ACL are controlled by the config file in NLB_SERVER_DIR (as defined in the nqeinfo file) and may not be changed without restarting the NLB.

The following values are supplied with the initial installation of NQE:

#    User    Host        Privs
#    ----    ----        -----
#
M_ACL:        root    *               all
M_ACL:        WORLD   *               r
D_ACL:        root    *               all
D_ACL:        OWNER   *               all
D_ACL:        WORLD   *               r

With the initial configuration, only root is able to change ACLs; everyone else, indicated by the keyword WORLD, has read-only access. Access may be granted for specific hosts, but an asterisk (*) is a wildcard indicating root access, regardless of the NQE server. If no ACLs are defined, the default ACL is used, and either root or an object's owner may change the object. Everyone else may view the object but not change it (read-only access). The types of privileges are: d (delete the object), u (update (change) the object), and r (read the object).

For information about editing the NLB configuration file, see “Editing the NLB Configuration File”.

For information about changing ACLs for different object types, see “Changing ACL Permissions”.

Starting the NLB Server

To start the NLB server, use the following command:

nlbserver 

The nlbserver command automatically runs in the background; you will receive the system prompt before the command completes execution.

Specifying the NLB Server Location

All programs that communicate with the NLB server (the collector, the NQE GUI Status and Load functions, and the nlbconfig and nlbpolicy clients) can use the same set of mechanisms to locate the server. These mechanisms are as follows; they are listed in order of precedence:

  1. The command line

  2. The NLB_SERVER environment variable

  3. The value set in the nqeinfo file at installation

  4. The .Xdefaults file

In each of these cases, the syntax is as follows. For one server, the syntax is:

"hostname[:service_name]"

If you have multiple servers, specify multiple hostname[:service_name] pairs; commas are not allowed, use a space to separate the servers. The syntax is as follows:

"hostname[:service_name] hostname[:service_name] hostname[
:service_name]"

The hostname is the Internet host name of the server host. The optional service_name can be either a name that is mapped to a port number or an ASCII representation of the port number. The service name is the string contained in the configuration file (see “Editing the NLB Configuration File”) as a value for the keyword PORT. The default value of service_name is nlb. Use quotation marks on the command line or when setting the NLB_SERVER environment variable. The following are examples of NLB_SERVER syntax:

"cool:nlb_server"
"ice:10001"
"hot:604 wind rain gust:705"

If you place the environment variable NLB_SERVER in the system startup file on a host, it will be available for all NLB commands. Specifying a server on a command line will override the environment variable.

In the case of the collector, data is sent to all servers listed. For all other programs the servers are tried in turn until one responds. If a server exits, the clients will automatically switch to the next specified server.


Note:: If you will send data to multiple NLB servers and will use job dependency, read “Multiple NLB Servers and Job Dependency” in Chapter 12.


Creating Redundant NLB Servers

Running redundant NLB servers ensures that all of the NQE nodes can still communicate with an NLB server, even if a network failure occurs.

When you install NQE collector nodes, specify all of the hosts on which you will run NLB servers so that collectors can send data to all NLB nodes.

You can set the NLB_SERVER environment variable to point to several hosts. The first host specified is the first one queried for information. If the first host does not respond, NQE tries the second host, and so on. For more information on setting environment variables, see Chapter 3, “Configuring NQE variables”.


Warning: The job dependency feature does not provide synchronization of events over multiple NLB servers. If you will use job dependency and redundant NLB servers, you must read the information about job dependency in Chapter 12, “Job Dependency Administration”.


Starting NLB Collectors

An NLB collector should be configured to run on each NQE node on which NQS is configured to run. The collector extracts various statistics from the system at regular intervals and sends them to the server.


Note: : If you will send data to multiple NLB servers and will use job dependency, read “Multiple NLB Servers and Job Dependency” in Chapter 12.

A number of attributes are collected (see “File Formats” in Chapter 8). For those attributes that collect dynamic statistics (such as NLB_PHYSMEM), both the current value and a rolling average are collected. The rolling average is the value of the attribute over the last n seconds; n is specified with the ccollect -r option.

You can start the collector from any directory. To start the collector, use the following command:

ccollect option

The ccollect command automatically runs in the background; you will receive the system prompt before the command completes execution.

The option can have the following meanings:

Option 

Description

-a 

Specifies that you want to use awk(1) to parse request (qstat) information instead of getting the information directly from the NQS database, which is the default source of information. To customize how data is parsed, use this option in conjunction with the -f, -p, and -q options.

-c 

Specifies whether NLB will have a permanent TCP/IP connection to the server. The default is to establish a new connection for each update. Even though new connections mean that the server uses more CPU time, they reduce the number of connections the server needs to have open.

-C filename 

Specifies the full or relative path name of a file that contains custom objects that are read by the collector and then sent to the NLB server. If the path is relative, the directory search order is the current working directory and then the /nqebase/etc directory (which is $NQE_ETC, as defined in the nqeinfo file), or the $NQE_ETC/config directory. You can specify this option multiple times. You also can set this option by using the NQE_CUSTOM_FILE_LIST variable either in the nqeinfo file or in the environment. If the variable exists in both places, the value in the environment takes precedence. The variable mechanism syntax lets you specify multiple files by separating them with either a space or a comma. The search order is the same as above.

Files used with the -C option are read when the collector performs an update, and the values for the attributes are sent to the NLB server.

-d file_system 

Reports the kilobytes of free disk space to the NLB server. The -d option adds the following attributes to the name_map file:

NLB_TMPNAME 

Directory name

NLB_FREETMP 

Free space in the file system (in kilobytes)

NLB_A_FREETMP 

The rolling average of NLB_FREETMP

-f parsefile 

Specifies the name of a file used to parse qstat information. The default is njs_qstat.awk. When you use this option, you also must use -a.

-i interval 

Specifies the interval (in seconds) at which the collector gathers machine load information. The default is 15 seconds.

-I interval 

Specifies the interval (in seconds) at which the collector gathers information about NQS batch requests. The default is 30 seconds.

-J 

Starts a collector that collects only job status statistics.

-L 

Starts a collector that collects only machine load statistics.

-o filename 

Used at collector startup time only. Specifies an object definition file that contains extra attributes to be sent to the server from this collector. Specifying additional attributes allows a site to use its own attributes in displays and policies (for example, a site might write a policy that uses a weighted value based on the number of CPUs).

-p "parse_command" 

Specifies a pattern-scanning command used to parse the output of a qstat command. When you use this option, also you must use -a.

-q "qstat_command" 

Specifies a qstat command to be used in gathering network job status statistics. This option is used if the qstat command is not in the collector's PATH environment variable. This must be a full path name. When you use this option, you also must use -a.

-Q queue 

(Required) Specifies the host's NLB_QUEUE_NAME attribute in the NLB database, which defines the queue name NQS uses when it sends load-balanced requests to this machine. If you are using the default NQE queue configuration, this queue is nqebatch.

-r interval 

Specifies the interval (in seconds) over which the rolling averages of attributes are calculated. This value is turned into a multiple of the update interval. A value of 0 turns off the rolling average attributes.

Rolling average attributes are provided to prevent load-balancing decisions from being influenced by sudden spikes in the various measurements. However, the longer the time period over which the averages are taken, the longer it will take the load balancer to note changes in a machine's current load. For more information on averaging data, see “Approaches to Designing Policies: Averaging Data” in Chapter 8.

-s server 

Specifies the name of the NLB server to which to send information. If you want to specify that multiple servers will be updated from this collector, you either can specify the -s option more than once, or you can specify the option once but place the server names in quotation marks. If you omit the option, the NLB_SERVER environment variable is used to locate the server.

The default mechanism for gathering batch request information is to read it directly from the NQS database. To customize the method by which data is collected and stored, use the -a, -f, -p, and -q options.

If you use the -a option, batch request information is parsed by using the qstat -af | awk -f njs_qstat.awk command.

To change the path name of the qstat command, use the ccollect -q option. You may want to do this if qstat is not in the collector's PATH environment variable. You must specify a full path name. If you use this option, you also must use the -a option.

To change the awk command used to parse the qstat output, use the -p option. For example, you may want to use nawk(1). If you use this option, you also must use the -a option.

To change the file name used to parse qstat information, use the -f option. You can customize the script njs_qstat.awk to parse the data the collector gathers so that it suits your site's needs. For example, you can modify the script so that two different collectors report information to two different NLB servers, thus keeping user communities separated.

To customize NLB collectors, see “Storing Arbitrary Information in the NLB Database (Extensible Collector)” in Chapter 8.

Managing the NLB Server

The nlbconfig facility lets you maintain and administer the NLB server. It provides the following capabilities:

  • Shutdown an NLB server

  • Instruct an NLB server to reread its policy file (the file name is policies)

  • Display, write, and modify name map information

  • Display, write, and modify object information

  • Display, write, and modify ACL information

  • Change storage class of objects

The following sections describe the nlbconfig command.

Specifying the Server on the Command Line

To specify a server on the command line, use the following command:

nlbconfig -s server

The server is the name of the host on which the NLB server resides. The server also can be specified in the nqeinfo file or with the NLB_SERVER environment variable (as described in “Specifying the NLB Server Location”). Specifying -s server overrides the value of NLB_SERVER. If you do not specify a server name and it is not set by any of the methods listed in “Specifying the NLB Server Location”, the nlbconfig command will fail with the following message:

nlbconfig: cannot determine server location

Verifying the NLB Server Is Running

To ensure that the NLB server is running, use the following command:

nlbconfig -ping

The response you receive tells you whether the NLB server is running. If the server is running, the response tells you the name of the server that was contacted when you issued the command.

Shutting down the Server

To shut down the server, use the following command:

nlbconfig -kill

Rereading the policies File

To reread the policies file and make any new information in it available to the NLB, use the following command:

nlbconfig -pol


Note:: The policies file that NLB reads when you issue an nlbconfig -pol command is in the location defined in the nqeinfo file NLB_SERVER_DIR/policies. Do not confuse this file with the policies file located in the /nqebase/etc directory, which is only used when the NLB directory is created. If you make changes to the policies file, make them to the NLB_SERVER_DIR/policies file.


Configuring Name Maps, Objects, and ACLs

The nlbconfig command uses similar options to operate on name maps, objects, and access control lists (ACLs).

You can list and dump the contents of the name map, object, and ACL files the server is currently using. When you list information, you receive basic information about the type of objects in the server. When you dump information, the current values of the objects are formatted and displayed. The following options perform these operations:

Option

Description

-mlist

Lists names, types, and storage classes in the name map file

-olist

Lists names of objects

-alist

Lists names of an ACL list

-mdump

Dumps values in name map file

-odump

Dumps values in the object file

-adump

Dumps values in the ACL list

The following options replace the contents (or part of the contents) of the name map, object, and ACL files.

Option

Description

-mput

Downloads a name map file to the server

-oput

Downloads the specified object(s) to the server

-aput

Downloads the specified ACLs to the server

The following options delete objects or ACLs in the server's files:

Option

Description

-odel

Deletes the specified object(s)

-adel

Deletes the specified ACLs

-mdel

Deletes the name map for the specified object types from the server

The following option adds attributes to objects in the server's files, leaving all the current attributes unchanged:

Option

Description

-oupdat

Updates the specified object(s) with additional attributes

The following options change the storage class of an object in the name map file (see page “The Name Map File” in Chapter 8 for more information about storage classes):

Option

Description

-perm

Changes the storage class of the specified object to permanent

-pers

Changes the storage class of the specified object to persistent

-tran

Changes the storage class of the specified object to transient

Options are provided to qualify the options already described. For example, when you want to download a name map file, you use the -f option with the nlbconfig command to specify the file name of the new file. The following qualifying options are provided (for definitions of objects, object names, and attribute names, see “NLB Overview”):

Option

Description

-f file

Specifies a file name that contains objects or ACL descriptions.

-t type

Specifies an object type or ACL type.

-h host

Specifies an ACL host name; the default is "*" (denoting all names).

-n name

Specifies an object name.

-a attribute

Specifies an object attribute name or a set of ACL privileges.

-u user

Specifies an ACL user name.

The following list indicates the options that can be qualified:

Option

Qualifier

-adel

Object type, user, and host

-adump

Object type and output file

-alist

Object type and output file

-aput

Object type, object attribute, user, host, and output file

-mdel

Object type

-mdump

Object type and output file

-mlist

Object type and output file

-mput

Input file

-odel

Object type and object name

-odump

Object type, object name, object attribute, and file

-olist

Object type, object name, and file

-oput

Input file

-oupdat

Input file

-perm

Object type

-pers

Object type

-trans

Object type

The -kill, -pol, and -ping options do not use qualifiers.

Examples of Configuring Name Maps and Objects

The following example lists the name map for the server defined by the NLB_SERVER environment variable:

nlbconfig -mlist

This command produces the following output:

object type     id          storage class
-----------     ----        -------------
NLB             1024        persistent
NJS             500         permanent
C_OBJ           1           persistent
NJS_ALIVE       501         permanent
FTA_CONFIG      400         persistent
FTA_DOMAIN      401         persistent
FTA_DATA        402         persistent
FTA_ACTION      403         persistent
NLB_JOB         1025        persistent
NLB_POL         3           persistent
EVENT           601         transaction  

The following example deletes the object of type C_OBJ with the name gust on the server defined (to locate the server, use the mechanisms described in “Specifying the NLB Server Location”):

nlbconfig -odel -t C_OBJ -n gust

Changing ACL Permissions

ACLs control permissions for reading, updating, or deleting objects in the NLB server. ACL permissions are usually changed with the nlbconfig command.

The following example adds an ACL to the ACL list that allows user mario to read (r) all objects of type C_OBJ from any host (the default) on the server defined by the NLB_SERVER environment variable:

nlbconfig -aput -t C_OBJ -u mario -a r

The master ACL controls all other ACLs; only users in this ACL can edit other ACL lists with nlbconfig. If you receive the following error message, you need to change the master ACL:

# nlbconfig -aput -t C_OBJ -u mario -a r
gust: No privilege for requested operation

To change the master ACL, edit the $NLB_SERVER_DIR/config file and restart the NLB; you cannot change it by using nlbconfig.

By default, all users have read permission for ACL information in the server database. The keyword WORLD is used to specify all users. If you changed permissions for any users, you must explicitly add WORLD read permission again.

To modify data in the server database, the NLB collector account (that is, the account from which the ccollect command is issued) must have u (update) and d (delete) permission.

All users can monitor all requests (through the NQE GUI Status window) in the NQE cluster.

To avoid having to issue a long list of commands, you can place the information in a file and then download that file to the server. The following command places the current ACL for NJS objects in a file named acl_njs:

nlbconfig -s cool -alist -f acl_njs

The acl_njs file has the following format:

      type            user            host            priv
      ----            ----            ----            ----
      NJS             luigi           *               dru
      NJS             root            *               dru
      NJS             mario           *               dru
      NJS             WORLD           *               r

All jobs in the NQE database will be seen, regardless of how you set you ACL.

The following nlbconfig command places the information in the server:

nlbconfig -aput -f acl_file