This chapter describes the NQE Network Load Balancer (NLB) policies and their implementation. The following topics are discussed:
Defining load-balancing policies
The policies file
Examples of how policies are applied
Testing policies
File formats
Implementing NQS destination-selection policies
Using batch request limits in policies
Storing arbitrary information in the NLB database (extensible collector)
A load-balancing policy is the mechanism whereby the NLB server determines a list of hosts (with the most desirable first) to be used as the target for a batch request. For example, when a request is submitted to a load-balanced queue, the NLB server applies a policy to the current information it has about hosts in the complex. The server provides NQS with a list of machines sorted by the specifications of the policy.
Policies are filters and sort routines that are applied to destination selection. The information in the NLB database objects is used to determine how to filter and sort the destinations (that is, how to balance the load across the NQE.) Policies consist of a set of Boolean and arithmetic expressions that use kernel statistics collected from the target machines. These expressions are defined in the policies file of the server and you can reread the file using the nlbconfig(8) command. Multiple policies can exist at any one time.
Some attributes are reserved by the NLB, and others can be defined by a site.
Sites have different requirements for the way they want work to be assigned to machines, and the workload in an NQE complex is often varied; therefore, no single set of rules need be successfully applied. Sites have the ability to determine their own policies based on their unique needs.
When the system is installed and initially configured, the NLB reads the $NQE_ETC/policies file for the initial configuration and writes it to the $NLB_SERVER_DIR/policies file. After that, only the $NLB_SERVER_DIR/policies file is read. You should define any local, site-defined policies in the $NQE_SPOOL/policies file.
![]() | Note:: The policies file that NLB reads when you issue an nlbconfig -pol command is located by default in $NLB_SERVER_DIR/policies. Do not confuse this file with the policies file located in the /nqebase/etc directory, which is used only on NQE start-up (after an install) if $NLB_SERVER_DIR/policies does not exist. If you make changes to the policies file, make them to the file in /usr/spool/nqe/nlbdir. |
The template for a policy definition is as follows:
# name is the mechanism for selecting an individual policy policy: name # A host must satisfy all specified constraints to be # selected as a destination by the policy. There can be # zero, one or many constraints. constraint: Boolean expression constraint: Boolean expression # Hosts which meet all constraints are then sorted by an # equation. The operation is applied to the attributes of # each host in turn, the host producing the largest result # is returned first. sort: arithmetic operation # Optional maximum number of results, prevents NQS from # attempting to send the request to too many destinations. count: integer |
Multiple policies can exist in the file. The first policy is defined as the default policy, which is named NQS and is the one used if the client does not specify a policy by name or if it specifies a policy that does not exist.
The following Boolean operators are supported in all expressions in a policy:
Operator | Description |
|| | Logical OR |
&& | Logical AND |
^ | Exclusive or (a or b but not a and b) |
! | Logical not |
== | Equal to |
!= | Not equal to |
>= | Greater than or equal to |
<= | Less than or equal to |
> | Greater than |
< | Less than |
=> | Implies. For example, a => b means that if a is true, b must also be true for the expression to be true. If a is false, the value of b is not examined and the expression is false. |
Operator precedence is the same as in the C language. The implies operator (=>) has the lowest precedence; that is, the => operation will be performed after all other operators.
The following functions also are available to you when defining load-balancing policies:
Function | Description | |
[variable == "regex"] | Tests that the string variable equals the regular expression (regex). This is a regular expression as understood by grep and ed, not the shell file name expansion form. Regular expressions are case insensitive. | |
exists (attrib) | Tests the existence of an attribute. The expression is true if the attribute exists and false if the attribute does not exist. | |
age | The number of seconds that have elapsed since this host's attributes were last updated. Used to filter out hosts that are not responding. |
All expressions support (), /, *, +, and - operators. The attributes of the hosts are used as the variables in the policy expressions. NQS attributes specified by using the NQE GUI or by using the -la option of the qsub or cqsub command can also be used in policies.
Attributes provided at startup are described in Table 8-2. Additional static attributes can be defined and set using the nlbconfig command; these attributes can then be used in policies.
![]() | Note:: The extensible collector allows you to add your own dynamic attributes. For additional information about the extensible collector, see “Storing Arbitrary Information in the NLB Database (Extensible Collector)”. |
The following section describes how a policy for NQS load balancing is applied. For information about how to specify a policy name, see “Commands for Configuring Destination-selection Queues” in Chapter 5.
After receiving a request in a load-balanced pipe queue, NQS will send a query to the NLB server. This query includes the name of the policy to be used.
The policy name is used to find the correct policy to apply. If the name is not that of a known policy, the default policy is applied. The default policy is the one named NQS.
NQS is shipped with a policy named nqs, which selects batch queues based on which system has the highest average idle CPU load.
The first constraint is taken in and applied to the list of hosts that have sent data to the server by using the ccollect process (described in “Starting NLB Collectors” in Chapter 7). It is applied to the objects of object type NLB. For a list of these objects, use the following command:
nlbconfig -olist -t NLB |
A constraint is expected to be a Boolean expression, and a true result means that the host is kept as a valid target. A false result means the host is discarded.
Each subsequent constraint is applied to the hosts that met the previous constraints. This process continues until no more constraints are left, or until no hosts meet a constraint. If there are no acceptable hosts, NLB returns a null list, and NQS retries the request later. If there are no constraints, all hosts are assumed to be acceptable targets for the policy.
Constraints can filter out machines that physically cannot run a request, as in the following example:
# Policy for SPARC-specific requests policy: SPARC constraint: [ NLB_HARDWARE == "SPARC" ] |
For the nqs policy, any machines that do not have a queue name associated with them are filtered out. It is also important to filter out machines that have not sent data recently. The time period used to evaluate this depends on the time interval between samples sent from the collectors. The following example sets the time period to 300 seconds:
constraint: exists(NLB_QUEUE_NAME) && age < 300 |
Constraints can also remove machines that are currently not good choices. For example, the following policy selects only machines that have more than 32768 Kbytes of free memory (32 Mbytes). Memory is measured in units of Kbytes:
policy: bigjob constraint: NLB_FREEMEM > 32768 |
After the list of hosts has been filtered, the sort expression/operation is used to order them. The expression/operation is applied to each host in turn, and the result for that host is stored in the NLB database as the NLB_POLICY attribute. The hosts are then sorted on this number. The host with the largest NLB_POLICY value is regarded as the best choice.
A policy can optionally specify a count of the maximum number of targets that should be returned. This is needed in large networks to reduce the number of different systems to which NQS might attempt to send a batch request.
Policies are based on statistical data gathered from machines by NLB collectors. This data reflects the state of the machine in the recent past, but its current state may be different.
The data available to the server is a dynamic snapshot of the machine state; therefore, the server's picture of the state of machines in the network is always a little out-of-date.
This section describes a special collection of attributes, which, by convention, are named NLB_A_regular attribute name, such as NLB_A_IDLECPU.
Because an instantaneous snapshot of machine load could cause an incorrect view of its true state over time, the collectors provide two sets of measurements: the instantaneous data (which is of most use in the NQE GUI Load window displays) and averaged data for use in policies. The averaged data tends to smooth out the short-term peaks and troughs in the instantaneous data and to give a more accurate view of the overall load, as can be seen in Figure 8-1 and Figure 8-2.
The averaged data (stored in attributes beginning with NLB_A_) are calculated by adding the instantaneous measures taken over the last n seconds and dividing by the number of samples taken (the ccollect -r option controls n).
You can tune the averaging of data in the policy to reflect the type of work being run on a machine. For example, you can user longer sampling periods on machines at which long requests are run. However, the longer the period you use, the longer it takes the measures to catch up with reality; for example, even after a large requests is finished, it takes time for the load it produced to work its way out of the data.
A network usually will contain a mix of machines: different types of machines and perhaps machines of the same type but differing capacity (such as CRAY T90 and CRAY J90 systems).
The measures sent by the collectors reflect what is actually happening on a particular machine, but they do not indicate how that activity compares to the maximum capacity of that machine, or how that machine compares to others in the network. To measure these factors, you must introduce weighting or normalization factors into policies.
For example, if you are concerned with the amount of idle CPU time available on a machine (which is reported as a percentage), you could write a policy that sorts on this factor, as follows:
policy: idlecpu sort: NLB_IDLECPU |
In this case, a small workstation may be selected as the best target for work in preference to a large Cray Research system, even though 10% idle time available on the Cray Research system could well represent much more computing power than 50% idle time on the workstation.
By adding weighting factors to machines, you can balance out this difference. If you define a new attribute called NLB_SITE_CPUNORM for each host (as in the example in “Example of Writing a Load-balancing Policy”), and assign values representing the relative CPU powers of the different machines (machine A is 100 while machine B is 10), you can factor this into the policy, as follows:
policy: idlecpu sort: NLB_IDLECPU * NLB_SITE_CPUNORM |
Of course, the relative powers of different machines are related to the nature of the request to be executed. A highly vectorized application probably requires a larger difference in weightings between a workstation and a Cray Research system than would a scalar application.
You may want to apply different policies at different times of the day or the week. For instance, you may want to allow requests to run on file servers at night when there is little NFS activity. Although there is no direct support for this in the NLB, you could use a cron job to load a new set of policies into the server at defined times. This job would replace the policies file and cause the server to reread it, as in the following example:
cp policies policies.day cp policies.night policies nlbconfig -pol |
The following example would switch the policies files back again:
cp policies policies.night cp policies.day policies nlbconfig -pol |
For this example to work, root must be in the master access control list (ACL) for the server defined in its configuration file, because this ACL controls who can update the policies file.
The following example selects all hosts that have the attribute NLB_QUEUE_NAME and for which the data has been updated within the last minute (AGE < 60). The hosts are then sorted by weighted idle CPU time (as described in “Other Factors: Weighting Machines”).
policy: AGE constraint: exists(NLB_QUEUE_NAME) && AGE < 60 sort: NLB_A_IDLECPU * NLB_SITE_CPUNORM |
The following example selects all machines with an average system load of 10%/CPU (or less); NLB_A_SYSCPU is the average system CPU (percentage) and NLB_NCPUS is the number of CPUs:
policy: default constraint: [NLB_HARDWARE == "CRAY"] => (NLB_A_SYSCPU / NLB_NCPUS) < 10.0 constraint: [NLB_HARDWARE == "sun"] => NLB_PHYSMEM >= 32768 sort: NLB_NCPUS |
The following example defines a new attribute used to normalize CPU loads. This is useful for a site with machines that have widely different CPU power. First a new object attribute is created, then values are assigned to the attribute, and finally the policy is added to the server NLB database so that data associated with it can be stored. All this can occur without any part of the system being restarted.
Define an attribute.
For a new attribute to be usable, it must be placed in the server's name_map file.
To extract the name map from the server, use the following command:
nlbconfig -mdump -f name_map |
This command produces a file containing all of the name maps known to the server. To define an attribute, you must edit this file to contain a name, description, and type for the new attribute, as in the following example:
map NLB=1024 { NLB_SITE_CPUNORM "CPU Normalization Factor" integer(1000); existing attributes . . } |
The new name (NLB_SITE_CPUNORM) and the new ID (1000) must not match any existing ones. The attribute must appear in the list of attributes for the NLB object (the top line of the example), and all existing attributes must be included in the file. If you do not include all of the existing attributes, they will be deleted when the new name_map file is read by the server.
Add the attribute to the name map by using the following command:
nlbconfig -mput -f name_map |
You have to do this only once; the server stores this data in its NLB database until you explicitly overwrite it again.
Add a value for the new attribute to the host data by creating an object definition that sets the value of the attribute for each NLB object, as follows:
object NLB (bighost) { NLB_SITE_CPUNORM = 1000; } object NLB (smallhost) { NLB_SITE_CPUNORM = 1; } |
This file is then loaded into the NLB database by using the following command:
nlbconfig -oupdat -f obj_defs |
The new attribute is now available for use in policies (in step 4).
Add a policy using the newly-defined attributes.
After values for an attribute have been placed in the NLB database (step 3), you can write policies based on these values. The following policy could be placed in the policies file to select hosts based on CPU normalization:
policy: idlecpu sort: NLB_IDLECPU * NLB_SITE_CPUNORM |
Issue the following command to cause the server to reread its policy file:
nlbconfig -pol |
This policy is now available for use by NQS.
![]() | Note:: Ensure an NQS load-balancing queue is defined to use the new policy. |
The load-balancing policies that work best for your site depend on the workload at your site. To help you develop policies, the nlbpolicy program is provided as a direct interface to the NLB. You can, for example, experiment with a policy without submitting requests to NQS. The syntax of the nlbpolicy command is as follows:
nlbpolicy [-a attribute = value] [-c count] [-d] [-h host] [-p policy] [-s server] |
The nlbpolicy command sends the server a policy query. The -a option specifies a list of attribute names and an optional value to find a host with the specified attribute set, or the specified attribute set to the specified value. You may specify more than one attribute and corresponding value. The value must be a decimal integer or float, in normal or scientific notation. You must expand (to a value in bytes) unit symbols to be acceptable to NQS commands, such as 10kw (for example, 10kw would be 10 x 8192 = 81,920).
The -c option controls the maximum number of hosts returned. The -p option selects a policy by name. The -h option (which can be repeated) specifies an acceptable target host to return.
Without the -d option, the program prints out the returned hosts ordered by the policy sort equation and the result of the equation for that host. With the -d option, all the information about each host is printed as an object data file.
The following policy, called AGE, returns a list of hosts ordered by the percentage of idle CPU time multiplied by a weighting factor:
policy:AGE sort: NLB_A_IDLECPU * NLB_SITE_CPUNORM |
To test the policy, use the following command:
nlbpolicy -p AGE |
The following output is returned:
cool 5.000000 cloudy 0.701754 gust 0.239234 hot 0.028249 |
The following command uses the same policy, but the -d option dumps all of the objects for the host, and the -c option limits the number of hosts to 2:
nlbpolicy -p AGE -d -c2 |
This command prints out all of the attributes of the selected hosts (objects of object type NLB).
The following two sections describe the NLB name map and the object data file formats.
The NLB server supports mappings between the internal values used to represent objects and attributes, and the human-readable names and descriptions. The mapping is defined by configuration data held within the server. This data can be downloaded from the server and converted to a text form by nlbconfig. The nlbconfig program also can take the text form of the name map and load it back into the server, allowing you to add new definitions to the map. This text form of the name map is known as a name_map file.
The name_map file contains a sequence of definitions for different object types and their associated attributes. Each object has a name and an ID. Each attribute of an object has a name, a text description, and an ID formed from a type and a number. The attribute ID and name must be unique from all other attributes of that object.
A name map for an object of object type NLB (information about an individual host) would be as follows; column one is the attribute name, column two is the attribute description, and column three is the attribute type ID:
map NLB = 1024 { NLB_OSNAME "OS name" string(1); NLB_RELEASE "Release" string(2); NLB_VERSION "Version" string(3); NLB_HARDWARE "Hardware" string(4); NLB_QUEUE_NAME "Queue name" string(5); NLB_TMPNAME "Temp file path" string(6); NLB_BSUSPEND "Auto suspend state" string(7); NLB_IDLECPU "Idle cpu" float(1); NLB_RUNQLEN "Run queue" float(2); NLB_SWAPPING "Swap activity" float(3); NLB_POLICY "Policy Weighting" float(4); NLB_SYSCPU "System cpu" float(5); NLB_RUNQOCC "Run queue occupancy" float(6); NLB_A_IDLECPU "Ave Idle cpu" float(11); NLB_A_RUNQLEN "Ave Run queue" float(12); NLB_A_SWAPPING "Ave Swap activity" float(13); NLB_A_SYSCPU "Ave System cpu" float(14); NLB_PHYSMEM "Memory" integer(1); NLB_FREEMEM "Free memory" integer(2); NLB_CLOCK "Clock speed" integer(3); NLB_NCPUS "Num CPUs" integer(4); NLB_PSWCH "Context switches" integer(5); NLB_NUMPROC "Process count" integer(6); NLB_FREETMP "Free temp space" integer(7); NLB_KEYIDLE "Keyboard idle time" integer(8); NLB_SWAPSIZE "Swap Device size" integer(9); NLB_SWAPFREE "Free Swap space" integer(10); NLB_A_FREEMEM "Ave Free memory" integer(11); NLB_A_PSWCH "Ave Context switches" integer(12); NLB_A_NUMPROC "Ave Process count" integer(13); NLB_A_FREETMP "Ave Free temp space" integer(14); NLB_A_SWAPFREE "Ave Free Swap space" integer(15); NLB_IOTOTAL "Total user I/O rate" integer(16); NLB_A_IOTOTAL "Ave Total user I/O Rate" integer(17); NLB_IOREQS "I/O Requests per sec" integer(18); NLB_A_IOREQS "Ave I/O Requests per sec" integer(19); NLB_SYSCALLS "System calls per second" integer(20); NLB_A_SYSCALLS "Ave system calls per second" integer(21); NLB_TPDAEMONUP "Tape Daemon running" integer(22); |
NLB_MPP_UP "Cray T3D available" integer(23); NLB_NQS_UP "NQS available flag" integer(24); NLB_NQS_VALTYPE "NQS validation type" integer(25); NLB_TIMESTAMP "Update time" time(4096); NLB_HOST "Originating host" string(4098); } |
You can change the attribute names (such as NLB_OSNAME) and the attribute descriptions. However, you cannot change the attribute type ID. The type ID numbers are built into ccollect(8); because of this, specific attributes always represent the same piece of information.
![]() | Note:: It is recommended that you do not change attribute names because they are used in X resource definitions for the NQE GUI Load window displays and in policy definitions. |
Table 8-1 lists the allowed attribute types:
Type | Contents |
---|---|
float | Floating-point value |
integer | Integer value |
string | Null-terminated ASCII string |
time | Time stamp (the number of seconds since 1 January 1970) |
You can add new attributes and store values to the NLB database at any time by using the nlbconfig command. However, the data cannot be used anywhere until the attribute has been defined in the name_map file, even if objects in the NLB database possess the attribute. These site-defined values can then be used in policies and the X resource definitions for the machine load displays, which are displayed by using the NQE GUI Load window. For an example of writing a load-balancing policy, see “Example of Writing a Load-balancing Policy”.
A number of object and attribute names are reserved. They are hardcoded into the NLB collectors. A default name mapping file is supplied for these objects and attributes. Attribute numbers below 8192 are reserved for Cray Research. Table 8-2 lists the reserved host object attributes.
There are averaged versions of all integer and floating-point attributes except for NLB_POLICY, NLB_KEYIDLE, NLB_PHYSMEM, NLB_CLOCK, NLB_NCPUS, and NLB_SWAPSIZE. These attributes use the naming convention _A inserted in the middle of the name (for example, NLB_A_IDLECPU is the average version of NLB_IDLECPU).
Table 8-2. Reserved Host Object Attributes
Attribute | Description | Type |
---|---|---|
NLB_BSUSPEND | Status of interactive activity on a server running csuspend(8). | String |
NLB_CLOCK | Clock speed. | Integer (picoseconds) |
NLB_FREEMEM | Amount of memory currently available. | Integer (Kbytes) |
NLB_FREETMP | Amount of free temporary file space. | Integer (Mbytes) |
NLB_HARDWARE | Hardware type as shown by the uname(1) command. | String |
NLB_IDLECPU | Percentage of idle CPU time. | Float (0 to 100) |
NLB_IOTOTAL | Amount of I/O being performed by user processes (read and write system calls). | Integer (Kbyte/s) |
NLB_IOREQS | Number of read and write system calls per second. | Integer |
NLB_KEYIDLE | Amount of time since console key pressed or mouse moved. Only valid on workstations. | Integer (minutes) |
NLB_MPP_UP | CRAY T3D MPP system available. | True/false |
NLB_NCPUS | Number of configured CPUs. | Integer |
NLB_NUMPROC | Number of processes in system at the last sample point. | Integer |
NLB_OSNAME | Operating system name as shown by the uname(1) command. | String |
NLB_PHYSMEM | Total amount of physical memory available to user processes. | Integer (Kbytes) |
NLB_POLICY | Result of last policy sort equation applied to host. | Float |
NLB_PSWCH | Context switches per second. | Integer |
NLB_QUEUE_NAME | Queue name for load-balanced NQS requests, defined by the ccollect option -Q. | String |
NLB_RELEASE | Operating system release as shown by the uname(1) command. | String |
NLB_RUNQOCC | Percentage of the time that the run queue was occupied (that is, processes were waiting for the CPU). A low number indicates the system is under used. | Float (0 to 100) |
NLB_RUNQLEN | Number of runnable processes. | Float |
NLB_SWAPFREE | Amount of unused swap space. | Integer (Mbytes) |
NLB_SWAPPING | Rate of data movement to and from swap devices. | Float (Kbyte/s) |
NLB_SWAPSIZE | Amount of swap space configured. | Integer (Mbytes) |
NLB_SYSCALLS | Number of system calls executed per second. | Integer |
NLB_SYSCPU | Percentage system CPU time. | Float (0 to 100) |
NLB_TMPNAME | Path name of file system used for the NLB_FREETMP attribute, defined by the ccollect option -d. | String |
NLB_TPDAEMONUP | UNICOS tape daemon is present. | True/false |
NLB_VERSION | Operating system version as shown by the uname(1) command. | String |
Table 8-3 lists the availability of the attribute data across platforms. The following conventions are used in the table:
Y | Indicates data is available |
- | Indicates no kernel data is available |
The NLB_TPDAEMONUP and NLB_MPP_UP attributes are collected only from UNICOS and UNICOS/mk systems.
Table 8-3. Attribute Implementation Across Platforms
Attribute | AIX | Digital UNIX | HP-UX | IRIX | Solaris | UNICOS | UNICOS/mk |
---|---|---|---|---|---|---|---|
NLB_OSNAME | Y | Y | Y | Y | Y | Y | Y |
NLB_RELEASE | Y | Y | Y | Y | Y | Y | Y |
NLB_VERSION | Y | Y | Y | Y | Y | Y | Y |
NLB_HARDWARE | Y | Y | Y | Y | Y | Y | Y |
NLB_QUEUENAME | Y | Y | Y | Y | Y | Y | Y |
NLB_TMPNAME | Y | Y | Y | Y | Y | Y | Y |
NLB_IDLECPU | Y | Y | Y | Y | Y | Y | Y |
NLB_RUNQLEN | Y | Y | Y | Y | Y | Y | Y |
NLB_SWAPPING | Y | Y | Y | Y | Y | Y | - |
NLB_SYSCPU | Y | Y | Y | Y | Y | Y | Y |
NLB_RUNQOCC | Y | - | - | Y | Y | Y | Y |
NLB_PHYSMEM | Y | Y | Y | Y | Y | Y | Y |
NLB_FREEMEM | Y | Y | Y | Y | Y | Y | - |
NLB_CLOCK | - | - | - | Y | Y | Y | Y |
NLB_NCPUS | - | Y | Y | Y | Y | Y | Y |
NLB_PSWCH | Y | Y | Y | Y | Y | Y | Y |
NLB_NUMPROC | Y | Y | Y | Y | Y | Y | Y |
NLB_FREETMP | Y | Y | Y | Y | Y | Y | Y |
NLB_KEYIDLE | - | Y | - | Y | Y | Y | Y |
NLB_SWAPSIZE | Y | Y | Y | Y | Y | Y | - |
NLB_SWAPFREE | Y | Y | Y | Y | Y | Y | - |
NLB_IOTOTAL | Y | - | - | Y | Y | Y | Y |
NLB_IOREQS | Y | - | Y | Y | Y | Y | Y |
NLB_SYSCALLS | Y | Y | Y | Y | Y | Y | Y |
NLB_TPDAEMONUP | Y | Y | Y | Y | Y | Y | Y |
NLB_MPP_UP | Y | Y | Y | Y | Y | Y | Y |
The NLB also stores attributes for running requests. There is an attribute for each field in the cqstatl or qstat -f display (over 200 attributes are included). To obtain a list of these attributes, use the following command:
nlbconfig -mdump -t NJS | more |
The output from this command lists attribute names, descriptions, and types. An abbreviated example follows:
# nlbconfig -mdump -t NJS |more ############################################### # creation date: Wed Mar 22 15:08:13 1995 # created by: ljgm # object timestamp: Sat Mar 18 11:56:52 1995 ######## map NJS = 500 { NJS_NQSID "NQS identifier" string(1); NJS_JOBSTAT "Job status" string(2); NJS_NUMPROC "Processes in job" integer(3); NJS_RUNUSER "Running user" string(4); NJS_QUENAM "Queue name" string(5); NJS_CPUREQLIM "Per-request CPU limit" integer(6); NJS_CPUREQUSE "Per-request CPU use" integer(7); NJS_MEMREQLIM "Per-request memory limit" float(8); NJS_MEMREQUSE "Per-request memory use" float(9); NJS_PFSREQLIM "Per-request PFS limit" float(10); NJS_PFSREQUSE "Per-request PFS use" float(11); NJS_ORGHOST "Originating host" string(12); NJS_ORIGUSER "Originating user" string(13); NJS_QUEENT "Position in queue" integer(14);... |
An object data file may be used to assign values to attributes for each object. Objects can be represented in a text format. Functions and commands are provided so administrators can create their own objects and insert them into the NLB database.
An object data file consists of a sequence of objects. Each object has a type, a name, and a sequence of assignment statements that define attribute values. The type names for objects and names for attributes are taken from the name map in the NLB database. The following example defines two attributes on three different hosts:
object NLB (cloudy) { NLB_SITE_CPUNORM = 10; NLB_APPLICATIONS = "UniChem dbase"; } object NLB (gust) { NLB_SITE_CPUNORM = 12; NLB_APPLICATIONS = "UniChem TurboKiva"; } object NLB (hot) { NLB_SITE_CPUNORM = 24; NLB_APPLICATIONS = ""; } |
The legal values for an attribute depend on its type as defined in the name map, as follows:
Type | Contents |
float | Floating-point value. Only fixed-point representation is supported (such as 35000.0 and not 3.5e4). |
integer | Integer value. This can be a signed integer. A leading 0 implies octal. |
string | A string surrounded by quotation marks (") or a single word containing only alphanumeric characters. |
time | Cannot be specified as input. |
First you would update the name_map file with the new attributes NLB_APPLICATIONS and NLB_SITE_CPUNORM and load it into the nlbserver. Note that because of how the server code works any attributes that are to be used in policies must be in either the NLB object or the NLB_JOB object. If any attribute is used in the policy file that is not in either of these two objects the policies file will not be read successfully by the nlbserver.
name_map file:
map NLB = 1024 { NLB_APPLICATIONS "Resident Applications" string(999); } |
Second, the appropriate objects must have attributes set properly. For this example, these are set with the following object data file that is downloaded to the nlbserver with the following command:
nlbconfig -t nlb -oupdat filename |
object NLB (gust) { NLB_APPLICATIONS = "Unichem TurboKiva Nastran"; } object NLB (wind) { NLB_APPLICATIONS = "Nastran"; } object NLB (rain) { NLB_APPLICATIONS = "Unichem"; } object NLB (frost) { NLB_APPLICATIONS = "TurboKiva"; } |
Third, write some policies that use the attribute in question.
Policy file:
policy: first constraint: [NLB_APPLICATIONS == "Nastran"] sort: (NLB_A_IDLECPU +5)/10000 policy: second constraint: [NLB_APPLICATIONS == "Unichem"] ||\n [NLB_APPLICATIONS == "TurboKiva"] sort: (NLB_A_IDLECPU +5)/10000 |
Finally see the results of the policies when they are invoked.
Output from nlbpolicy:
nlbpolicy -p first gust 0.0092525 wind 0.003213 nlbpolicy -p second gust 0.009225 rain 0.004950 frost 0.000500 |
It is relatively simple to set a policy that selects target NQS systems with the lightest CPU load.
To use this destination selection mechanism, batch requests must be submitted to a pipe queue that is configured to perform destination selection. By defining both standard NQS pipe queues and destination-selection pipe queues, and by setting an appropriate default batch queue, sites can define several workable environments that allow integration of destination selection into their environment.
The examples in “Example of Load Balancing Using the Default Policy”, and “Example of Load Balancing by Selection”, show how sites might configure NQS destination-selection queues to determine how batch requests are routed.
To configure an NQS destination selection environment, define a pipe queue as a destination selection pipe queue on each system that will route requests using NLB (as described in “Example of Load Balancing Using the Default Policy”). No destinations should be defined for this pipe queue. The queue must be marked as a destination selection pipe queue (CRI_DS). (For additional information, see “Commands for Configuring Destination-selection Queues” in Chapter 5.)
The collectors on your NQE nodes have been reporting their availability. When you submit a request to the new queue, the NLB selects one of these servers as the destination queue for your job.
If the qmgr default batch queue is set to a destination-selection pipe queue, all requests that do not designate a specific queue are routed using NLB. If the default queue is not a destination-selection queue, you can use the General Options Queue Name selection of the Configure menu on the NQE GUI Submit window, use the -q queues option, or designate the queue in the script file to select a destination-selection queue.
If you are using file validation, each user must create a .rhosts or .nqshosts file on each system to which a request might be routed by the NLB. If you are using password validation, the user's password must be the same on all systems to which the NLB could route the request.
The following example could be used as a policy definition in the NLB policies file:
# Policy for load-balancing between Cray systems # Note that NLB_HARDWARE can be "CRAY J90", "CRAY Y-MP", # "CRAY T3E", etc. and still match [ NLB_HARDWARE == "cray" ] # ( AGE < 300 ) for systems that have sent data in the # last 5 minutes # (NLB_PHYSMEM - NLB_A_FREEMEM....) for a ratio of memory used # (real and swap) to real memory of less than 4 # (NLB_NCPUS <= 2.... for less than 20% system CPU per CPU for # hosts with 1 or 2 CPUs # (NLB_NCPUS > 2.... for less than 3.5% system CPU per CPU for # hosts with more than 2 CPUs # #Sorted by ratio of number of CPUs to length of the run queue # policy: RUNQUEUE constraint:[NLB_HARDWARE == "cray"] && age < 300 constraint:(NLB_PHYSMEM - NLB_A_FREEMEM + (NLB_SWAPSIZE - NLB_SWAPFREE) * 1024) /NLB_PHYSMEM < 4 constraint: NLB_NCPUS <= 2 => NLB_A_SYSCPU / NLB_NCPUS <20 constraint:(NLB_NCPUS > 2) => (NLB_A_SYSCPU / NLB_NCPUS <3.5) sort: NLB_NCPUS / NLB_A_RUNQLEN |
Destination selection can be used to limit requests when host resources reach a threshold. For example, the following policy omits any host with system CPU usage greater than 10%. This can keep a busy system from receiving more requests.
policy: AGE constraint: AGE < 300 constraint: NLB_A_SYSCPU < 10 sort: NLB_A_IDLECPU * NLB_NCPUS |
NQS destination-selection queues used for load balancing specify the policy for that queue. For this reason, you may want to have multiple load-balancing queues for requests with disparate resource requirements. For example, CPU-bound requests could be routed to a machine with idle CPU but little free memory. The following policies, MEMORY and CPU, serve this purpose. The first limits hosts to those with more than 32 Mbytes of memory and a low memory demand, calculated by the memory oversubscription. Hosts are then sorted so that larger memory machines are chosen first.
The CPU policy uses additional NLB attributes that would be added by the site. NLB_SYSCPU_LIMIT would be defined as an integer value that could be different for each host. NLB_RUNQLEN_LIMIT also could be different for each host. The CPU policy limits the system CPU time and the run queue length. Hosts are sorted so that those with idle CPU time are first, and those with more CPUs are preferred. If there is no idle CPU time, the hosts are sorted by the number of CPUs.
policy: MEMORY constraint: NLB_PHYSMEM > 32768 constraint: (NLB_PHYSMEM - NLB_A_FREEMEM + (NLB_SWAPSIZE - NLB_SWAPFREE) * 1024) / NLB_PHYSMEM < 3 sort: NLB_PHYSMEM policy: CPU constraint: NLB_A_SYSCPU / NLB_NCPUS < NLB_SYSCPU_LIMIT constraint: NLB_A_RUNQLEN / NLB_NCPUS < NLB_RUNQLEN_LIMIT sort: NLB_A_IDLECPU * NLB_NCPUS + NLB_NCPUS |
This section discusses NQS behavior when processing a batch request submitted to a destination selection queue.
When a batch request is submitted to a destination selection queue, it obtains a list of destinations from the NLB server. The list of destinations is tried one at a time until the request is accepted by one of the destinations or until the list is exhausted. If no destination accepts the request, NQS waits and tries again, based on the interval set by the qmgr command set default destination_retry time (the default is 5 minutes). NQS continues to try to deliver the request to a destination until the request is accepted or the request has waited the maximum allowable time, which is defined by the qmgr command set default destination_retry wait (the default is 72 hours). If no destination accepts the request during the maximum time, the request is deleted and the submitting user is notified by mail.
One of the following situations also may occur:
If a destination selection pipe queue is defined incorrectly, the destination selection function is not invoked and the request is rejected. The user receives mail indicating that the request was submitted to a queue with no destinations.
If an NLB server is not available at any of the hosts defined by $NLB_SERVER, a message is written to the NQS log file. No other attempt is made to route the request to a destination host. NQS continues to attempt to deliver the request until it is accepted or until NQS has waited the maximum allowable time.
If the list of destinations is empty, a message is generated in the NQS log file. NQS continues to attempt to deliver the request until it is accepted or until the request has waited the maximum allowable time indicated by the qmgr command set default destination_retry wait.
If no host in the list of destinations will accept the request, a message is generated in the NQS log file. NQS continues to attempt to deliver the request until it is accepted or until the request has waited the maximum allowable time indicated by the qmgr command set default destination_retry wait.
If you are using file validation, each user submitting requests to a load-balanced pipe queue must have a .rhosts or .nqshosts file in their home directory for each batch system that could run work. These files are required to route requests to execution hosts and return results to the originating host. If you are using password validation, the user's password must be the same on all NQE servers; for example, if you are using NIS.
Requests that are routed using destination selection are not guaranteed to run, because the local NQS systems may apply local limits to its queues. For information about using batch request limits in policies, see “Using Batch Request Limits in Policies”.
The following qmgr commands create the pipe queue nlbqueue and define it as the default queue for batch job requests:
% qmgr Qmgr: create pipe_queue nlbqueue server=(pipeclient CRI_DS) priority=33 Qmgr: set default batch_request queue nlbqueue Qmgr: quit |
All requests that use the default queue are routed to a destination host by the destination selection algorithm. The NLB policy (for load balancing) defaults to nqs. The NLB server host defaults to NLB_SERVER.
In this scenario, a site has multiple NQS systems. The systems are similar, but there is production work that must run on specific systems. There is also work that could run on any available system. The default is that requests are submitted to a standard pipe queue. Users are encouraged to choose a destination selection queue when they submit work that can run anywhere in the complex.
This situation can be accommodated by defining two pipe queues, as shown in Figure 8-4. One is a standard pipe queue, called standard. The second, called nlbqueue, is configured to perform destination selection.
The following qmgr commands create these pipe queues and set the default queue to standard so that requests are routed by standard NQS routing. The pipe queue standard explicitly defines a policy name, NLB host, and NLB port.
Qmgr: create pipe_queue standard server=(pipeclient) \ destination=([email protected],[email protected]) priority=30 Qmgr: create pipe_queue nlbqueue \ server=(pipeclient CRI_DS CPU cool:nlb) priority=30 Qmgr: set default batch_request queue standard |
Requests that are submitted to the queue nlbqueue are routed to the most lightly loaded system.
Users can set various per-process and per-request limits on batch requests when they submit the request. You can use these limits to define policies that route a request based on the size of the limits. NQS lets you specify attributes on a request by using the General Options Attributes selection of the Configure menu on the NQE GUI Submit window or by using the cqsub -la option. These attributes can be used within NLB policies by defining the same attributes in the NLB name map.
For example, using the cqsub -la option, the user specifies an NQS attribute as follows:
cqsub -la nastran |
When it receives the request, NQS adds the characters job_ to the beginning of the attribute name and checks to see if the attribute has been defined in the NLB name map. JOB_ attributes are attributes of type NLB_JOB. In this example, the JOB_NASTRAN attribute is defined in the name map, and it will have the value 1 when applying the policy.
For example, you could define a policy that restricts large memory requests to large memory machines, or requests with small time limits could use less strict constraints when selecting a destination.
Using nlbconfig, you can define absolute request size limits for hosts and have policies enforce these limits. For example, a policy could constrain requests that require more than a specified amount of memory, disk space, or CPU time from being sent to specific machines.
The data sent by the destination selection module of the NQS pipeclient includes the attributes shown in Table 8-4. The attributes are present only when the associated command-line option was used in the request. All attributes, with the exception of tape drives, are floating-point numbers, and all values are normalized so that measurements are in bytes (or seconds for CPU limits). Tape drive attributes are integers.
The string attribute JOB_REQNAME that contains the request name is always present. You can use this attribute to make specific decisions for special request names.
![]() | Note:: The following limits can also be set by selecting the General Options Job Limits submenus of the Configure menu on the NQE GUI Submit window. |
Table 8-4. Request Limit Attributes
Option | Description | Attribute name |
---|---|---|
-lc | Specifies the per-process core file size limit. | JOB_PPCORESIZE |
-ld | Specifies the per-process data segment size limit. | JOB_PPDATASIZE |
-lf | Specifies the per-process permanent file space limits. | JOB_PPPFILESIZE |
-lF | Specifies the per-request permanent file space limits. | JOB_PRPFILESPACE |
-lm | Specifies the per-process memory size limits. | JOB_PPMEMSIZE |
-lM | Specifies the per-request memory size limits. | JOB_PRMEMSIZE |
-lQ | Specifies the per-request SDS limits. | JOB_PRQFILESPACE |
-ls | Specifies the per-process stack size limit. | JOB_PPSTACKSIZE |
-lt | Specifies the per-process CPU time limits. | JOB_PPCPUTIME |
-lT | Specifies the per-request CPU time limits. | JOB_PRCPUTIME |
-lw | Specifies the per-process working set size limit. | JOB_PPWORKSET |
-lU type | Specifies the per-request tape drive limits for type, which can be a, b, c, d, e, f, g, or h. | JOB_TAPES_A to JOB_TAPES_H |
If you have three different application packages, NASTRAN, Oracle, and UniChem, which are used by requests but not available on all machines, you would set up a policy as described in this section. To let users specify on the command line which application packages are required, you can perform the following steps:
Define attribute names for hosts and requests to represent these packages. Download a copy of the existing name map by using the following command:
nlbconfig -mdump -f name_map |
Edit the file name_map as shown in the following example. The attributes for the request must be defined for object type NLB_JOB with ID 1025 and must start with job_. When you edit the file, do not delete the existing names from the name map. If you do not include all of the existing names, they will be deleted when the server reads the new name_map file. The host attributes enable you to specify whether or not each host has that application available. Notice in step 2 that two of the three machines have UniChem, two have Oracle, and only one has NASTRAN. If a job requires an application, you want to submit it to a machine that has that application. This is how you define whether or not a machine has that application.
map NLB_JOB = 1025 { (All names from the existing name map must be included) job_nastran "Job requires nastran" integer(8192); job_oracle "Job requires oracle" integer(8193); job_unichem "Job requires unichem" integer(8194); } map NLB = 1024 { (All existing names from the name map must be included) host_nastran "Host supports nastran" integer(8192); host_oracle "Host supports oracle" integer(8193); host_unichem "Host requires unichem" integer(8194); } |
To add these names to the name map, use the following command:
nlbconfig -mput -f name_map |
Define the host_ attributes by creating an object data file and downloading this into the server.
object NLB (cool) { host_nastran = 1; host_unichem = 1; } object NLB (gust) { host_oracle = 1; host_unichem = 1; } object NLB (hot) { host_oracle = 1; } |
To load this file into the server, use the following command:
nlbconfig -oupdat -f obj_defs |
Add constraints to the policy.
You must define constraints for the policy that selects hosts based on the job_ attributes. These constraints have no effect unless the user specifies an attribute using the -la option of the cqsub command or if using the General Options Job Limits submenus of the Configure menu on the NQE GUI Submit window.
Use the implies operator (=>), as follows:
constraint: exists(job_nastran) => exists(host_nastran) constraint: exists(job_oracle) => exists(host_oracle) constraint: exists(job_unichem) => exists(host_unichem) |
Using the implies operator (=>) in these constraints means that if a job_ attribute is specified by the user, the equivalent host attribute also must exist. If the job_ attribute is not present, the equivalent host attribute is not required either.
Such job_ attributes can be used for any purpose. They are not restricted to the presence or absence of applications on hosts.
You can test a policy by using the nlbpolicy -p option. Its syntax is as follows:
nlbpolicy -p policyname |
Optionally, you can use the -a option to specify any attribute described in Table 8-4, or you can create your own attributes and add them to the name map. Valid attributes are defined in the NLB for object type NLB_JOB. They must begin with the characters JOB_ and can be either a decimal integer or a float. The value can be either a decimal or a float, in normal or scientific notation. You cannot specify negative, hexadecimal, or octal numbers.
The extensible collector is a mechanism built into NQE that allows you to store arbitrary information in the NLB database. This information is periodically sent to the NLB along with the data that is normally stored and updated by NQE. Once the customized data is in the NLB database, it can be used in policies or displays just as any other data in the NLB database. NQE collects and stores the customized data, but you define, generate, and update it.
The customized data that is stored in the NLB database is added as new attributes to either the NLB object or the NLB_JOB object. Only attributes that are in these two objects can be used in policies and displays. Attributes that are in any other object that is used in a policy will result in the policy file not being read, and no policies will be available.
As a background for the rest of this description, you should be familiar with the following terms:
Name map or name map file
Object or NLB object
Attribute
Object data file
nlbconfig command
nqeinfo file; which is the site NQE configuration file, and it is generated at installation time (see Chapter 3, “Configuring NQE variables”)
The following steps are necessary to get customized objects into NQE:
Define the new attributes and put them in either the NLB object or the NLB_JOB object in the name map file so that they are available for use in a load-balancing policy.
To add attributes to an object, the current name_map file should be retrieved from the NLB server by using the nlbconfig command. Then add the new attribute definitions to the current name_map file. (The current name_map file retrieved from the NLB server is an ASCII file, so any additions are done using an editor.) Finally, using the nlbconfig command, download the updated name_map file to the NLB server.
Generate an ASCII file of the following form:
object NLB ("host name") { NEW_ATTRIBUTE_1 = 12345; NEW_ATTRIBUTE_2 = "This is a text string"; } |
The attribute values may be added to an object without specifying all attributes of the object. This ASCII file is referred to as a custom object file.
Write a program or a shell script to populate a file with instances of the newly defined attributes. The format of this file is that of an object data file. This program or script will periodically update this object data file with new values for the attributes that have been defined.
Instruct the NQE collector program to read from the new object data file. This is done by using either the -C option of the ccollect program or by using the variable NQE_CUSTOM_COLLECTOR_LIST in either the nqeinfo file or as an environment variable. To specify multiple input files, use the -C option for each file.
![]() | Note:: If the environment variable method is used, it must be defined before the collector is started. |
The custom object files specified to the collector will be read once each interval and forwarded to the NLB server. After the program that generates the custom objects exists, the collector provides an additional option that starts the user program each time the collector is started. The collector automatically checks the NQE_CUSTOM_COLLECTOR_LIST variable for a list of programs to start. In addition to starting a program, the collector detects when the program terminates and restarts it.
The following example shows the syntax required to specify a list of custom collectors that the NLB collector will start. The NLB collector looks for this list in the NQE_CUSTOM_COLLECTOR_LIST environment variable or looks in the nqeinfo file for a variable of the same name. This example assumes the nqeinfo file is being used:
NQE_CUSTOM_COLLECTOR_LIST="prog1, arg1, arg2: prog2: prog3, arg1" |
The program names are separated by a colon, and the arguments are separated by a comma. The program names are either full path names, or they are interpreted as being relative to the directory from which the NQE collector was started, which is NQE_BIN if the nqeinit script is used.
An additional feature, called customized collector startup, can help automate the generation of the extended object data files. This feature allows the administrator to specify one or more programs that will be automatically started by the NLB collector. The program can be either an executable file or a script file. Each program can have multiple command-line arguments. The NLB collector detects when any of the customized collectors has terminated and will restart it during its next cycle.
The NLB database information is supplied by the ccollect program. ccollect relies on the sar command for retrieving system performance data. The sar command on UNICOS/mk systems currently does not provide system performance data for memory or swapping usage statistics. As a result, the NQE GUI Load window for memory demand displays a fixed value of 96% when used for UNICOS/mk systems.
In addition, the following NLB attributes are not meaningful for UNICOS/mk systems: NLB_A_SWAPPING, NLB_SWAPPING, NLB_SWAPSIZE, NLB_SWAPFREE, NLB_FREEMEM, and NLB_A_FREEMEM.