The Network Load Balancer (NLB) is comprised of servers, collectors, and clients. The NQE GUI Status function uses collectors that communicate with the NLB server to obtain batch request status information.
This chapter describes NLB administration. The following topics are discussed:
Overview of NLB
Overview of NLB configuration
Editing NLB configuration file
Configuring ACLs for the NLB
Starting the NLB server
Specifying the NLB server location
Creating redundant NLB servers
Starting NLB collectors
Managing the NLB server, including shutdown
The NQE User's Guide, publication SG-2148, describes the NQE GUI Load window, which allows you to monitor machine load and system usage across the complex.
For information about the NQE database and NQE scheduler, see Chapter 9, “NQE Database”.
The Network Load Balancer (NLB) server contains the NLB database for NQE information. It stores this information in the form of objects. A policy is the mechanism by which the NLB server selects destinations based on the information in the NLB database. (Policies are described in “Defining Load-balancing Policies” in Chapter 8.)
Each object in the NLB database is uniquely identified by a combination of its type (the definition of the object) and its name. All objects of a specific type have unique names to differentiate them; these object names are case insensitive. The following example describes one object in the NLB database:
object NLB (host1) |
The object type is NLB and the name of the object is host1.
Each object contains a set of attributes. An attribute consists of unique attribute names, which identify it and its corresponding attribute value. Attribute names are defined when the object type is defined; attribute names are case insensitive. The object type and attribute names are defined in the name map.
The following is an example of an object with one of its attributes:
object NLB (host1) { NLB_HARDWARE = "Sun4c" } |
The NLB object named host1 contains an attribute called NLB_HARDWARE, which is set to Sun4c. For a complete list of attributes for an object type, use the following command:
nlbconfig -mdump -t object_type |
Although the NLB database can store any defined object, within NQE only the following object types are defined and used:
Object type | Description | |
NLB | Data sent by the collector processes containing information about the NQE server that may be used for destination selection. Attributes of this object include such things as CPU load. | |
NJS and NJS_ALIVE | Data sent by collector processes describing NQS requests on the NQE server. This is used by the NQE GUI Status function. NJS_ALIVE tells you the last time the NLB hear from a collector; it is used by the NQE GUI Load function to indicate an NQE server is no longer talking to the NLB. It may also be used for destination selection. | |
FTA_CONFIG and FTA_DOMAIN | FTA configuration objects. For information on defining these objects, see “Modifying the NLB Database for NPPA” in Chapter 13. | |
NLB_JOB | Data sent to the NLB about the request, which the NLB uses to choose a destination for the request. | |
C_OBJ | Configuration for the NLB itself. |
The NLB has the following configuration requirements:
The NLB server can be run from any account unless it is configured to use a privileged TCP/IP port number.
The server requires a working directory and free space for data storage. This directory holds configuration files used by the server; it also stores a permanent copy of the NLB database. To run 750 requests, 250 Kbytes should be ample space.
There are two configuration files, as follows:
config, which defines some constants of operation and the master ACL used to control administrative access to the server. It is read at startup time.
policies, which contains the definitions of policies. This file also is read at startup, but the servers reads it again when the administrator issues the nlbconfig -pol command. This file is described in “The policies File” in Chapter 8. NQS is shipped with a policy named nqs, which selects batch queues based on which system has the highest average idle CPU load.
The location of these files is defined in the NLB_SERVER_DIR attribute of the nqeinfo file.
The NLB server database is held in memory, but it is periodically backed up to disk if it has been modified. The update interval is configurable; the default is 30 seconds. See the SKULK field in the NLB configuration file, described in the following section. The NLB database also is written out to disk on server shutdown. When the server restarts, it reads the NLB database from disk.
Access control lists (ACLs) are read and written in the same manner as the NLB server database. (For a description of ACLs, see “Configuring Name Maps, Objects, and ACLs”.)
The restart sequence, in combination with collectors, provides a high degree of consistency between NLB database contents and the information sent to the server.
The NLB configuration file sets the configurations that cannot be changed while the NLB server is running. To change these settings, edit the NLB configuration file and then restart the NLB.
The NLB configuration file, $NLB_SERVER_DIR/config (where NLB_SERVER_DIR is defined in the nqeinfo file), is an ASCII text file that has the following format:
KEY:value |
The KEY field defines the parameter being controlled; value gives the setting of the parameter. The interpretation of value depends on the KEY. Any text after a # symbol is a comment; spaces or tabs can be used anywhere in the file. There are defaults for all fields.
The following is an example of a configuration file:
# NLB server configuration file - master and default ACLs # # OBJECTS: - defines maximum number of different object types # SKULK: - interval in seconds between database update # PORT: - TCP service name or port number to listen on # M_ACL: - record for master ACL (can be repeated) # D_ACL: - record for default ACL (can be repeated) # # If M_ACL is not defined then it defaults to the user who # started the server on the same machine. The master ACL # controls the editing of other ACLs. # # If D_ACL is not defined the default ACL is empty which # means objects are not restricted. If there is an ACL for an # individual object type it overrides the default ACL # OBJECTS: 32 # number of different objects PORT: nlb # tcp service name SKULK: 30 OBJ: 500 permanent OBJ: 501 permanent # User Host Privs # ---- ---- ----- M_ACL: root * all M_ACL: fred * all |
Access to data in the NLB is controlled through Access Control Lists (ACLs). Each object type may have its own ACL. If no ACL is defined for an object type, then the default ACL (D_ACL) is used for objects of that object type. If there is no ACL for an object type and there is not an ACL, then access to any object of that object type is unrestricted. The ability to access the ACLs is controlled by the master ACL (M_ACL).
The M_ACL and the D_ACL are controlled by the config file in NLB_SERVER_DIR (as defined in the nqeinfo file) and may not be changed without restarting the NLB.
The following values are supplied with the initial installation of NQE:
# User Host Privs # ---- ---- ----- # M_ACL: root * all M_ACL: WORLD * r D_ACL: root * all D_ACL: OWNER * all D_ACL: WORLD * r |
With the initial configuration, only root is able to change ACLs; everyone else, indicated by the keyword WORLD, has read-only access. Access may be granted for specific hosts, but an asterisk (*) is a wildcard indicating root access, regardless of the NQE server. If no ACLs are defined, the default ACL is used, and either root or an object's owner may change the object. Everyone else may view the object but not change it (read-only access). The types of privileges are: d (delete the object), u (update (change) the object), and r (read the object).
For information about editing the NLB configuration file, see “Editing the NLB Configuration File”.
For information about changing ACLs for different object types, see “Changing ACL Permissions”.
To start the NLB server, use the following command:
nlbserver |
The nlbserver command automatically runs in the background; you will receive the system prompt before the command completes execution.
All programs that communicate with the NLB server (the collector, the NQE GUI Status and Load functions, and the nlbconfig and nlbpolicy clients) can use the same set of mechanisms to locate the server. These mechanisms are as follows; they are listed in order of precedence:
The command line
The NLB_SERVER environment variable
The value set in the nqeinfo file at installation
The .Xdefaults file
In each of these cases, the syntax is as follows. For one server, the syntax is:
"hostname[:service_name]" |
If you have multiple servers, specify multiple hostname[:service_name] pairs; commas are not allowed, use a space to separate the servers. The syntax is as follows:
"hostname[:service_name] hostname[:service_name] hostname[ :service_name]" |
The hostname is the Internet host name of the server host. The optional service_name can be either a name that is mapped to a port number or an ASCII representation of the port number. The service name is the string contained in the configuration file (see “Editing the NLB Configuration File”) as a value for the keyword PORT. The default value of service_name is nlb. Use quotation marks on the command line or when setting the NLB_SERVER environment variable. The following are examples of NLB_SERVER syntax:
"cool:nlb_server" "ice:10001" "hot:604 wind rain gust:705" |
If you place the environment variable NLB_SERVER in the system startup file on a host, it will be available for all NLB commands. Specifying a server on a command line will override the environment variable.
In the case of the collector, data is sent to all servers listed. For all other programs the servers are tried in turn until one responds. If a server exits, the clients will automatically switch to the next specified server.
![]() | Note:: If you will send data to multiple NLB servers and will use job dependency, read “Multiple NLB Servers and Job Dependency” in Chapter 12. |
Running redundant NLB servers ensures that all of the NQE nodes can still communicate with an NLB server, even if a network failure occurs.
When you install NQE collector nodes, specify all of the hosts on which you will run NLB servers so that collectors can send data to all NLB nodes.
You can set the NLB_SERVER environment variable to point to several hosts. The first host specified is the first one queried for information. If the first host does not respond, NQE tries the second host, and so on. For more information on setting environment variables, see Chapter 3, “Configuring NQE variables”.
![]() | Warning: The job dependency feature does not provide synchronization of events over multiple NLB servers. If you will use job dependency and redundant NLB servers, you must read the information about job dependency in Chapter 12, “Job Dependency Administration”. |
An NLB collector should be configured to run on each NQE node on which NQS is configured to run. The collector extracts various statistics from the system at regular intervals and sends them to the server.
![]() | Note: : If you will send data to multiple NLB servers and will use job dependency, read “Multiple NLB Servers and Job Dependency” in Chapter 12. |
A number of attributes are collected (see “File Formats” in Chapter 8). For those attributes that collect dynamic statistics (such as NLB_PHYSMEM), both the current value and a rolling average are collected. The rolling average is the value of the attribute over the last n seconds; n is specified with the ccollect -r option.
You can start the collector from any directory. To start the collector, use the following command:
ccollect option |
The ccollect command automatically runs in the background; you will receive the system prompt before the command completes execution.
The option can have the following meanings:
Option | Description | |
-a | Specifies that you want to use awk(1) to parse request (qstat) information instead of getting the information directly from the NQS database, which is the default source of information. To customize how data is parsed, use this option in conjunction with the -f, -p, and -q options. | |
-c | Specifies whether NLB will have a permanent TCP/IP connection to the server. The default is to establish a new connection for each update. Even though new connections mean that the server uses more CPU time, they reduce the number of connections the server needs to have open. | |
-C filename | Specifies the full or relative path name of a file that contains custom objects that are read by the collector and then sent to the NLB server. If the path is relative, the directory search order is the current working directory and then the /nqebase/etc directory (which is $NQE_ETC, as defined in the nqeinfo file), or the $NQE_ETC/config directory. You can specify this option multiple times. You also can set this option by using the NQE_CUSTOM_FILE_LIST variable either in the nqeinfo file or in the environment. If the variable exists in both places, the value in the environment takes precedence. The variable mechanism syntax lets you specify multiple files by separating them with either a space or a comma. The search order is the same as above. Files used with the -C option are read when the collector performs an update, and the values for the attributes are sent to the NLB server. | |
-d file_system | Reports the kilobytes of free disk space to the NLB server. The -d option adds the following attributes to the name_map file: | |
-f parsefile | Specifies the name of a file used to parse qstat information. The default is njs_qstat.awk. When you use this option, you also must use -a. | |
-i interval | Specifies the interval (in seconds) at which the collector gathers machine load information. The default is 15 seconds. | |
-I interval | Specifies the interval (in seconds) at which the collector gathers information about NQS batch requests. The default is 30 seconds. | |
-J | Starts a collector that collects only job status statistics. | |
-L | Starts a collector that collects only machine load statistics. | |
-o filename | Used at collector startup time only. Specifies an object definition file that contains extra attributes to be sent to the server from this collector. Specifying additional attributes allows a site to use its own attributes in displays and policies (for example, a site might write a policy that uses a weighted value based on the number of CPUs). | |
-p "parse_command" | Specifies a pattern-scanning command used to parse the output of a qstat command. When you use this option, also you must use -a. | |
-q "qstat_command" | Specifies a qstat command to be used in gathering network job status statistics. This option is used if the qstat command is not in the collector's PATH environment variable. This must be a full path name. When you use this option, you also must use -a. | |
-Q queue | (Required) Specifies the host's NLB_QUEUE_NAME attribute in the NLB database, which defines the queue name NQS uses when it sends load-balanced requests to this machine. If you are using the default NQE queue configuration, this queue is nqebatch. | |
-r interval | Specifies the interval (in seconds) over which the rolling averages of attributes are calculated. This value is turned into a multiple of the update interval. A value of 0 turns off the rolling average attributes. Rolling average attributes are provided to prevent load-balancing decisions from being influenced by sudden spikes in the various measurements. However, the longer the time period over which the averages are taken, the longer it will take the load balancer to note changes in a machine's current load. For more information on averaging data, see “Approaches to Designing Policies: Averaging Data” in Chapter 8. | |
-s server | Specifies the name of the NLB server to which to send information. If you want to specify that multiple servers will be updated from this collector, you either can specify the -s option more than once, or you can specify the option once but place the server names in quotation marks. If you omit the option, the NLB_SERVER environment variable is used to locate the server. |
The default mechanism for gathering batch request information is to read it directly from the NQS database. To customize the method by which data is collected and stored, use the -a, -f, -p, and -q options.
If you use the -a option, batch request information is parsed by using the qstat -af | awk -f njs_qstat.awk command.
To change the path name of the qstat command, use the ccollect -q option. You may want to do this if qstat is not in the collector's PATH environment variable. You must specify a full path name. If you use this option, you also must use the -a option.
To change the awk command used to parse the qstat output, use the -p option. For example, you may want to use nawk(1). If you use this option, you also must use the -a option.
To change the file name used to parse qstat information, use the -f option. You can customize the script njs_qstat.awk to parse the data the collector gathers so that it suits your site's needs. For example, you can modify the script so that two different collectors report information to two different NLB servers, thus keeping user communities separated.
To customize NLB collectors, see “Storing Arbitrary Information in the NLB Database (Extensible Collector)” in Chapter 8.
The nlbconfig facility lets you maintain and administer the NLB server. It provides the following capabilities:
Shutdown an NLB server
Instruct an NLB server to reread its policy file (the file name is policies)
Display, write, and modify name map information
Display, write, and modify object information
Display, write, and modify ACL information
Change storage class of objects
The following sections describe the nlbconfig command.
To specify a server on the command line, use the following command:
nlbconfig -s server |
The server is the name of the host on which the NLB server resides. The server also can be specified in the nqeinfo file or with the NLB_SERVER environment variable (as described in “Specifying the NLB Server Location”). Specifying -s server overrides the value of NLB_SERVER. If you do not specify a server name and it is not set by any of the methods listed in “Specifying the NLB Server Location”, the nlbconfig command will fail with the following message:
nlbconfig: cannot determine server location
To ensure that the NLB server is running, use the following command:
nlbconfig -ping |
The response you receive tells you whether the NLB server is running. If the server is running, the response tells you the name of the server that was contacted when you issued the command.
To reread the policies file and make any new information in it available to the NLB, use the following command:
nlbconfig -pol |
![]() | Note:: The policies file that NLB reads when you issue an nlbconfig -pol command is in the location defined in the nqeinfo file NLB_SERVER_DIR/policies. Do not confuse this file with the policies file located in the /nqebase/etc directory, which is only used when the NLB directory is created. If you make changes to the policies file, make them to the NLB_SERVER_DIR/policies file. |
The nlbconfig command uses similar options to operate on name maps, objects, and access control lists (ACLs).
You can list and dump the contents of the name map, object, and ACL files the server is currently using. When you list information, you receive basic information about the type of objects in the server. When you dump information, the current values of the objects are formatted and displayed. The following options perform these operations:
Option | Description |
-mlist | Lists names, types, and storage classes in the name map file |
-olist | Lists names of objects |
-alist | Lists names of an ACL list |
-mdump | Dumps values in name map file |
-odump | Dumps values in the object file |
-adump | Dumps values in the ACL list |
The following options replace the contents (or part of the contents) of the name map, object, and ACL files.
Option | Description |
-mput | Downloads a name map file to the server |
-oput | Downloads the specified object(s) to the server |
-aput | Downloads the specified ACLs to the server |
The following options delete objects or ACLs in the server's files:
Option | Description |
-odel | Deletes the specified object(s) |
-adel | Deletes the specified ACLs |
-mdel | Deletes the name map for the specified object types from the server |
The following option adds attributes to objects in the server's files, leaving all the current attributes unchanged:
Option | Description |
-oupdat | Updates the specified object(s) with additional attributes |
The following options change the storage class of an object in the name map file (see page “The Name Map File” in Chapter 8 for more information about storage classes):
Option | Description |
-perm | Changes the storage class of the specified object to permanent |
-pers | Changes the storage class of the specified object to persistent |
-tran | Changes the storage class of the specified object to transient |
Options are provided to qualify the options already described. For example, when you want to download a name map file, you use the -f option with the nlbconfig command to specify the file name of the new file. The following qualifying options are provided (for definitions of objects, object names, and attribute names, see “NLB Overview”):
Option | Description |
-f file | Specifies a file name that contains objects or ACL descriptions. |
-t type | Specifies an object type or ACL type. |
-h host | Specifies an ACL host name; the default is "*" (denoting all names). |
-n name | Specifies an object name. |
-a attribute | Specifies an object attribute name or a set of ACL privileges. |
-u user | Specifies an ACL user name. |
The following list indicates the options that can be qualified:
Option | Qualifier |
-adel | Object type, user, and host |
-adump | Object type and output file |
-alist | Object type and output file |
-aput | Object type, object attribute, user, host, and output file |
-mdel | Object type |
-mdump | Object type and output file |
-mlist | Object type and output file |
-mput | Input file |
-odel | Object type and object name |
-odump | Object type, object name, object attribute, and file |
-olist | Object type, object name, and file |
-oput | Input file |
-oupdat | Input file |
-perm | Object type |
-pers | Object type |
-trans | Object type |
The -kill, -pol, and -ping options do not use qualifiers.
The following example lists the name map for the server defined by the NLB_SERVER environment variable:
nlbconfig -mlist |
This command produces the following output:
object type id storage class ----------- ---- ------------- NLB 1024 persistent NJS 500 permanent C_OBJ 1 persistent NJS_ALIVE 501 permanent FTA_CONFIG 400 persistent FTA_DOMAIN 401 persistent FTA_DATA 402 persistent FTA_ACTION 403 persistent NLB_JOB 1025 persistent NLB_POL 3 persistent EVENT 601 transaction |
The following example deletes the object of type C_OBJ with the name gust on the server defined (to locate the server, use the mechanisms described in “Specifying the NLB Server Location”):
nlbconfig -odel -t C_OBJ -n gust |
ACLs control permissions for reading, updating, or deleting objects in the NLB server. ACL permissions are usually changed with the nlbconfig command.
The following example adds an ACL to the ACL list that allows user mario to read (r) all objects of type C_OBJ from any host (the default) on the server defined by the NLB_SERVER environment variable:
nlbconfig -aput -t C_OBJ -u mario -a r |
The master ACL controls all other ACLs; only users in this ACL can edit other ACL lists with nlbconfig. If you receive the following error message, you need to change the master ACL:
# nlbconfig -aput -t C_OBJ -u mario -a r gust: No privilege for requested operation |
To change the master ACL, edit the $NLB_SERVER_DIR/config file and restart the NLB; you cannot change it by using nlbconfig.
By default, all users have read permission for ACL information in the server database. The keyword WORLD is used to specify all users. If you changed permissions for any users, you must explicitly add WORLD read permission again.
To modify data in the server database, the NLB collector account (that is, the account from which the ccollect command is issued) must have u (update) and d (delete) permission.
All users can monitor all requests (through the NQE GUI Status window) in the NQE cluster.
To avoid having to issue a long list of commands, you can place the information in a file and then download that file to the server. The following command places the current ACL for NJS objects in a file named acl_njs:
nlbconfig -s cool -alist -f acl_njs |
The acl_njs file has the following format:
type user host priv ---- ---- ---- ---- NJS luigi * dru NJS root * dru NJS mario * dru NJS WORLD * r |
All jobs in the NQE database will be seen, regardless of how you set you ACL.
The following nlbconfig command places the information in the server:
nlbconfig -aput -f acl_file |