This chapter discusses concepts and terms important to NQE use and administration.
Batch requests are shell scripts that are executed independently of any interactive terminal connection. The operating system treats this shell script as the standard input file (stdin). The only difference between this and a conventional shell script is that you can include NQS options (preceded by a special prefix) in the batch request file as comments before the first executable shell command.
When a request is submitted from the local or remote host, the standard output (stdout) and standard error (stderr) files are returned to the directory from which the request was submitted or returned to a user-specified alternative path. Optionally, users can request that stdout, stderr, and the job log file be merged (a job log file contains messages of NQS activity about a request).
Batch requests can have the attributes that are described in the following sections.
The nice value is the execution priority of the processes that form the batch request. A user can explicitly set the nice value (through the Job Limits selection of the Configure menu on the NQE GUI Submit window or as a cqsub or qsub command option) or NQS can set the nice value implicitly, based on the queue in which the request resides. Increasingly negative nice values increase the relative execution priority of a process.
|Warning: Do not specify a negative nice increment for a queue because it may cause serious CPU scheduling problems, such as timeouts in system daemons or interference with kernel scheduling.|
A batch request can specify the full path name of a shell to interpret its shell script. If a user does not specify a shell, the user's login shell is used.
When NQS initiates a request, it spawns a second shell.
The second shell invoked is always /bin/sh unless another shell is specified by the user. The two-shell method allows stdin read-ahead by commands such as remsh and cat.
To define a shell for the first shell invoked, users can use the General Options selection of the Configure menu on the NQE GUI Submit window or use the cqsub -s or qsub -s command.
Users can define a shell other than /bin/sh to interpret their batch request by including the following line as the first line of the file (before any #QSUB directives):
#! shell_path (such as /bin/csh)
You can include shell options on this line.
NQS sets the QSUB_SHELL environment variable to the shell the user is in when executing a request. NQS sets the SHELL environment variable to the initial shell executed by NQS.
The initial shell is not necessarily the same shell as the second shell. The second shell will always be /bin/sh, unless the user includes #! shell_path as the first line of the batch request.
The following example uses the cqsub command and shows how to verify the shells that an NQS request is using. User jjj uses csh as a login shell. The batch request file job1 contains only one line, as follows:
ps -ef | fgrep jjj
The request is submitted by using the following command:
cqsub -s /bin/csh job1
The following output shows that the request was executed under /bin/sh because jjj did not modify the first line of the file as required:
jjj 11874 11873 2 15:18:17 ? 0:00 fgrep jjj jjj 11355 11354 0 14:54:37 pts/0 0:00 -csh jjj 11873 11871 2 15:18:17 ? 0:00 /bin/sh /nqebase/nqeversion/poe/database/spool/scripts/++++2++cd4X+++ jjj 11871 11870 22 15:18:16 ? 0:00 -csh jjj 11875 11874 9 15:18:17 ? 0:00 ps -ef
To run the request under csh, you must modify the batch request file as follows:
#!/bin/csh ps -ef | fgrep jjj
The cqsub -s /bin/csh job1 command produces the following output:
jjj 11969 11967 19 15:23:06 ? 0:00 /bin/csh /nqebase/nqeversion/pendulum/database/spool/scripts/++++3++cd4X+++ jjj 11355 11354 0 14:54:37 pts/0 0:00 -csh jjj 11967 11966 5 15:23:05 ? 0:00 -csh jjj 11974 11969 9 15:23:08 ? 0:00 ps -ef
If you want to use a one-shell invocation method, you can set the NQSCHGINVOKE environment variable to true or yes. The environment variable is set on a per-request basis; you must export environment variables by using the General Options selection of the Configure menu on the NQE GUI Submit window or by using the qsub -x option. If NQSCHGINVOKE is set to true, and you do not request a shell (by using the General Options selection of the Configure menu on the NQE GUI Submit window or by using cqsub -s or qsub -s), NQS invokes the request owner's UDB shell on UNICOS and UNICOS/mk systems or their login shell on UNIX systems.
To set a fixed shell path and invocation method, edit the nqeinfo file to define the following two variables:
The default values are a null string for the shell path and 2 for the invocation method.
The allowed value for NQE_NQS_SHELLPATH is any valid shell path name. The default null string for NQE_NQS_SHELLPATH uses the user's login shell. If you specify a shell path name for NQE_NQS_SHELLPATH, the shell you specify is used for batch request processing.
The value specified by using the General Options selection of the Configure menu on the NQE GUI Submit window or specified by using the cqsub -s or qsub -s option, if used, overrides the system-level shell path configured in the nqeinfo file.
The allowed value for NQE_NQS_SHELLINVOCATION is 1 or 2. A value of 1 for NQE_NQS_SHELLINVOCATION uses a one-shell invocation method for all NQS requests. A value of 2 uses the default two-shell invocation method.
The value of any NQSCHGINVOKE environment variable received with a request overrides the system-level invocation method configured in the nqeinfo file.
During NQS startup, the values of these two variables are verified. If a stat() system call of a non-null shell path name fails, or if the value of the invocation method is not either 1 or 2, the nqsdaemon aborts its startup. Messages describing the problem are written to the NQS console file and/or the log file.
|Note: Alternate user validation applies only for NQS requests submitted using the cqsub command.|
By default, an NQE job is submitted for execution under the user name executing the cqsub command. By using the cqsub -u command, or by setting the User Name field in the NQE GUI Submit display, a user can submit a request for execution under a different user name.
The user account on the client workstation is referred to as the originating client user account, and the user account on the NQS server is referred to as the target user account.
The nqeinfo file variable NQE_NQS_NQCACCT controls which user account is validated on the NQS_SERVER when a user submits a job for execution under a different user name. The value of this variable on the NQS_SERVER host determines which user name is used for validation on the NQS_SERVER.
The value ORIGIN means that the originating client user account is validated on the NQS_SERVER. This means that the client user must have an account on the NQS_SERVER with the same name (but not necessarily the same UID) as the originating client user account.
The value TARGET means that the target user account is validated on the NQS_SERVER. It is not necessary for the client user to have an account on the NQS_SERVER with the same name as the client user account.
If the NQE_NQS_NQCACCT variable is not set on the NQS_SERVER, it defaults to the value ORIGIN.
If NQE_NQS_NQCACCT is set to ORIGIN, and password validation is set on the NQS_SERVER, then both the client user account on the NQS_SERVER and the target user account on the NQS server must have the same password or the password validation will fail.
The intraqueue priority determines the order in which requests are initiated. After a request is accepted into a batch queue, the NQS scheduler calculates the intraqueue priority each time it scans the requests in the queue. The order in which queues are considered does not change; only the order in which requests are selected from a queue changes. The eligible request with the highest intraqueue priority is selected next for initiation. (Eligible means the request meets all quotas such as run limit, user limit, and group limit). For additional information, see “Defining Job Scheduling Parameters” in Chapter 5.
A request or the individual processes of a request can impose a set of limits on the use of a resource. Typical limits are maximum file size, CPU time, and maximum memory size. A user can explicitly set up resource limits by using NQE GUI options or cqsub command options, or NQS can set the resource limits implicitly, based on the batch queue in which the request executes.
The limits supported by NQS are described in “Defining Per-request and Per-process Limits” in Chapter 5.
The state of a request can be found by using the NQE GUI Status window or the cqstatl -f or qstat -f command. The state is displayed after Status: or at the top right of the detailed request display. (To access a detailed request display through the NQE GUI Status window, double-click on a request line.)
An abbreviated form of the status also is displayed on the NQE GUI Status window under the Job Status column or under the ST column of the request summary display of the cqstatl -a or qstat -a command.
The current state of the request shown on the cqstatl or qstat display can be composed of a major and a minor status value. Major status values are as follows:
NQE Database request
Routing (pipe queues only)
Running (batch queues only)
Waiting for a date and/or time specified by the -a option to cqsub, waiting for a license, or waiting for a network connection
Minor status values are described on the cqstatl(1) and qstat(1) man pages.
A batch request submitted to the NQE database typically progresses through the following states; the first four characters of a state are displayed in the ST column of a database request summary (shown by using the cqstatl -a or qstat -a command):
The request is in the NQE database.
The request is in the NQE database awaiting scheduling.
The request is in the NQE database and has been scheduled by the NQE scheduler.
A copy of the request has been submitted from the NQE database for processing.
The copy of the request submitted from the NQE database for processing has completed processing.
The copy of the request submitted from the NQE database for processing has terminated.
A batch request submitted to a local pipe queue for routing to a batch queue (defined in the next section) typically progresses through the following states at a local pipe queue:
Awaiting routing to another queue
Being moved to another queue
A request progresses through the following states in a batch queue:
Being moved from another queue
Waiting for a specified execution time
Left the queue but not yet deleted
An executing request can also have the following states:
Held by NQS operator action; resources are not released.
Processing temporarily suspended by an operator; resources are released.
The ordering of requests within a queue does not always determine the order in which the request is processed; the NQS request scheduler determines the processing order, depending on the various limits that the system administrator or operator imposes.
A queue is a set of batch requests. NQS assigns requests to a particular queue, where they wait until they are selected for execution. NQS also assigns an execution priority to each request. NQS executes each request according to its priority and routes any output to the specified destination. The following sections describe the two types of queues: batch and pipe.
NQE is shipped with one pipe queue and one batch queue configured on each NQS server. The default pipe queue is nqenlb . The default batch queue is nqebatch. This configuration uses the nqenlb queue to send requests to the NLB, which then sends requests to nqebatch on the most appropriate system, based on an NLB policy. The default NLB policy, called nqs, sends batch requests to the system with the most available CPU cycles. If requests are submitted to the queue nqebatch, the request runs on the local NQS server.
Batch queues accept and process batch requests. Each batch queue has an associated set of request and process resource limits. If a request is to be accepted in a particular batch queue, no resource limit specified by the request may exceed the corresponding limit specified by the target batch queue. If a batch request fails to specify a resource limit that is enforced, NQS sets the limit for the request to the corresponding default limit of the queue. The qmgr commands define the queue default values.
|Note: For UNICOS and UNICOS/mk systems, NQS assigns the limit for the request to either the default limit of the queue or the user database (UDB) limit, whichever is more restrictive.|
If the qmgr manager lowers the request and process resource of a queue, all requests that are already in the queue with quotas that exceed the new limits remain in the queue through a "grandfather" clause.
Every batch queue also has a set of associated queue limits, which are described in “Queue Attributes”.
Pipe queues handle the routing and delivery of requests from one queue to another. They serve as pipelines, transporting requests to other local and remote queue destinations.
Pipe queues do not have associated quota limits (although they can have attributes). Pipe queues have a set of associated queue destinations, on both local and remote machines, to which they route requests. These destinations can be batch queues or other pipe queues.
NLB pipe queues used for load balancing do not have destinations associated with them.
Each request in a pipe queue is routed to one of the queue's destinations. Destinations are considered in the order in which the administrator configures them. If a request cannot be accepted by a destination queue (for example, if its resource requirements exceed those allowed for membership in the queue), the next destination is considered. If no destination accepts the request, the request is deleted, and the user receives a mail message.
If a destination is inaccessible (for example, if the network connection is broken or if the loadonly limit for a queue has been reached), NQS tries to requeue the request at specified intervals (defined by the qmgr command set default destination_retry wait) for a specified length of time (defined by the qmgr command set default destination_retry time). If these attempts fail, NQS abandons the attempted request and the user receives a mail message.
Each pipe queue has an associated program spawned to handle every request initiated from the queue for routing and delivery. In the context of the client/server network connection, this program is referred to as the pipeclient process.
Queue, queue complex, and global limits (defined beginning with “Queue Attributes”) limit the number of requests that can run concurrently. You can use qmgr commands to set a finite value that limits the maximum number of requests that can run at one time. Most limits can be set to unlimited. When a queue is created, these limits are defined as unspecified; unspecified means that the queue has the default value for that limit.
You do not have to set every limit. A comprehensive set of limits is provided so that you can configure the workload to meet your site's needs.
This section provides summary information on the request submission process, controlling requests, and monitoring requests.
By default, requests are sent to NQS. You may change the default and have requests sent to the NQE database, or your users may select the destination of their individual requests. For additional information about configuring the NQE database, see Chapter 9, “NQE Database”.
When a user submits a batch request to NQS, the request is sent to nqsdaemon. The nqsdaemon checks the request qualifiers (if there are any) against the defined queue limits and attributes. Based on these checks, nqsdaemon accepts or rejects the request. If the queue is a pipe queue, when nqsdaemon determines that a request should be routed, it spawns the pipeclient program to route the request from the pipe queue to another pipe or batch queue. Figure 2-1 shows this process.
|Note:: The concept of local submission does not apply for requests submitted to the NQE database.|
A user can submit a request to NQS or to the NQE database. The following sections describe the process of submitting a request to each destination.
When a user submits a request to NQS using the NQE GUI Submit window or the NQE cqsub command, no queues are involved on the local host. The request is sent to the netdaemon process on the server machine as specified by the NQS_SERVER variable. The netdaemon creates a child process (netserver) to handle the request. The child process tries to queue the request with nqsdaemon. netserver ensures that the user has an account on the target host and that the user name on the local machine is authorized to run requests on the target machine.
The target nqsdaemon then checks the request qualifiers against the defined queue limits and attributes and, based on these checks, accepts or rejects the request. Figure 2-2 shows this process.
When a user submits a request to the NQE database using the NQE GUI Submit window or the NQE cqsub command, the NQE scheduler determines when and where the request will execute. The NQE database routes a copy of the request to the lightweight server (LWS) on the node running the NQS server. The LWS verifies validation, sends a copy of the request to the NQS batch queue for processing, and obtains exit status of completed requests.
Figure 2-3 shows request processing for client submission to the NQE database.
If the request is sent from one NQS system to a remote NQS system, pipeclient sends the request to the remote netdaemon. The rest of the processing is the same as it is for client submission to NQS. Figure 2-4 shows this process.
|Note:: The concept of remote submission does not apply for requests submitted to the NQE database.|
When nqsdaemon determines that a request is eligible to run, it spawns a shepherd process. The shepherd process sets up the environment for the request and runs it. It sends mail to the user if it cannot execute the request. The shepherd process waits for the request to complete and returns output file(s) to the specified or default file destination.
NQS users can control or delete their own requests by using the NQE GUI Status window Actions menu options or by using the cqdel or the qdel command. NQS operators and managers can control requests with the following qmgr commands (see “Manipulating Requests” in Chapter 6, for a description of these commands and examples of their use):
schedule request (first, next, now, system)
Managers and operators can also delete requests from any user by using the cqdel or qdel command.
Queue attributes define the properties of a queue. A queue attribute can be assigned by using the qmgr set attribute command. For a description of how to set attributes, use the qmgr help set attribute command.
The following sections describe possible attributes for NQS queues. For additional information about defining queue properties, see “Defining NQS Queues” in Chapter 5.
The queue attributes described in the following sections apply to both batch and pipe queues.
Queue access can be either restricted or unrestricted. If access is restricted, only requests submitted by users defined in the access set for the queue are accepted. If access is unrestricted, any request can enter the queue.
The access set is composed of individual users or groups and is defined by the system administrator.
A batch queue or a pipe queue can be declared pipeonly, meaning that it can accept requests only from a pipe queue. This declaration is useful when the pipeclient process selects the final queue, based on the network queue configuration. For example, if all requests are submitted to a pipe queue, the pipe queue can retry submissions if no batch queues are currently accepting requests. Another example might be that you want to configure your queues such that requests of a certain size go to certain queues; if all requests are submitted to pipe queues, the pipe queue can route the requests to the proper queues without users having to know which queue to use.
The interqueue priority dictates the order in which queues of each type (batch or pipe) are scanned in a search for the next request to process. The queue with the highest priority is scanned first.
The difference between interqueue and intraqueue priority is important. The interqueue priority determines the order in which queues are searched for requests to process; the intraqueue priority determines the order in which requests within a queue are considered for initiation when the queue is searched.
The queue attributes described in the following sections apply only to batch queues.
You may not submit requests directly to a loadonly queue. Requests may come only from a pipe queue. In other words, a loadonly queue is a pipeonly queue that has the additional restriction of being able to accept only a limited number of requests.
A batch queue can be declared loadonly, meaning that it can accept only a limited number of requests. The number of requests that can be queued is restricted to the run limit of that queue. Therefore, if no other limits exist, a loadonly queue accepts only the number of requests that it can run. Specifically, the algorithm is the number of requests queued + number of requests running + number of requests waiting + number of requests arriving <= run limit.
This algorithm can be used for load sharing between multiple machines or across multiple queues on a single machine. In the example in Figure 2-5, requests are submitted to the pipe queues [email protected] and [email protected]; they are routed from these queues to the central pipe queue, which queries all three batch queues on both hot and ice. The central pipe queue would try to route a small request to [email protected] first, next try [email protected], and continue down the list until a place in a queue became available. Without using loadonly queues, the queue [email protected] may accept the request, and the request may remain queued.
The queue group limit associated with each batch queue is the maximum number of requests allowed to run in the queue at any one time that were submitted by all users who are members of one group.
The queue memory limit associated with a batch queue is the maximum amount of memory that can be requested by all running requests in the queue at one time.
The queue user limit associated with a batch queue is the maximum number of requests allowed to be run in the queue at any time by one user.
For Cray MPP systems, each batch queue has an associated MPP PE limit. This is the maximum number of MPP PEs that can be requested by all running jobs in the queue at any one time.
At any time, the state of a queue is defined by two properties:
The queue's ability to accept requests. The states associated with this property are as follows:
The NQS daemon (nqsdaemon) is not running at the local host.
The queue is not currently accepting requests.
The queue is currently accepting requests.
The queue's ability to route (pipe queue) or initiate (batch queue) requests. The states associated with this property are as follows:
Queued requests are allowed to run, although none are currently running.
Queued requests are allowed to run, and some are currently running.
NQS is not running on the host in which the queue resides.
Queued requests are not allowed to run, and none are currently running.
New queued requests are not allowed to run, although currently executing requests are running to completion.
A queue complex is a set of local batch queues. Each complex has a set of associated attributes, which provide for control of the total number of concurrently running requests in member queues. This, in turn, provides a level of control between queue limits and global limits (see “Queue Attributes”, for information about queue attributes and “Global Limits”, for information about global limits). The following queue complex limits can be set:
MPP processing element (PE) limits (CRAY T3D systems), or MPP application processing elements (CRAY T3E systems, or number of processors (IRIX systems)
Quick-file limits (UNICOS systems with SSDs only)
Several queue complexes can be created at a given host, and a batch queue can be a member of several complexes.
|Note:: A batch request is considered for running only when all limits of all complexes of which the request is a member have been met.|
“Defining Queue Complexes” in Chapter 5, describes how to set queue complex limits.
Global limits restrict the total workload executing concurrently under NQS control. While queue limits restrict requests in queues and complex limits restrict requests in a complex, global limits restrict the activity in the entire NQS system. The following global limits may be set:
Tape drive limits
MPP processing element (PE) limits (CRAY T3D systems), or MPP application processing elements (CRAY T3E systems, or number of processors (IRIX systems)
Quick-file limits (UNICOS systems with SSDs only)
“Defining Global Limits” in Chapter 5, describes how to set global limits.
Load balancing is the process of allocating work (such as NQS batch requests) in order to spread the work more evenly among the available hosts in a group of NQE nodes in the NQE cluster. Load balancing can minimize instances in which one machine has idle time while another remains saturated with work.
Load balancing can be done either by using the Cray Network Load Balancer (NLB) or by using the NQE database and its associated scheduler. The two methods are described in the following sections.
The following sections describe the concepts used in the Cray Network Load Balancer (NLB).
The NLB server provides a generic, network-accessible data storage mechanism that can be used for a variety of purposes, including load balancing, NQS request status, and network configuration. Because the data can be used for various purposes, the server holds data as generic network data objects; it has no knowledge of the meaning of the data it stores. The programs that load and interrogate the database, known as collectors and clients, define the meaning of the data in the server.
NLB servers can be replicated to ensure that if connection to one is lost, a duplicate can be reached by collectors and clients. The server is a passive process; it does not actively solicit data from other processes. The NLB collectors periodically generate new information and send it to all configured NLB servers. The various NLB clients query NLB servers one by one in the configured order until they find one that is running or until they exhaust the list.
Some object types require access control to prevent unauthorized entry and extraction of data. Such objects are controlled with access control lists (ACLs) used in conjunction with authentication data sent with each message to the NLB server. ACLs are also groups of objects and can be edited and viewed using nlbconfig. The master ACL controls access to all other ACLs; it is defined in the NLB configuration file.
The server associates an ACL with each group of objects it maintains. Records in the ACL consist of a user name, host name, and a set of privileges (read, update, or delete). There are two special ACL user names: OWNER and WORLD. If an object has an owner attribute associated with it, the OWNER ACL controls the particular user's permissions. An ACL record with the same name and host as the user making a request controls that user's access rights, and the WORLD record controls all other users.
The read privilege lets users extract an object from the server; the update privilege lets users add new objects of that type or modify existing objects; and the delete privilege lets users remove objects from the server.
When an object is sent to the server by an authenticated collector, a special OWNER attribute may be present. This attribute is used to check permissions to read the object on subsequent queries.
Destination selection is the process of determining the best host to which to send work. A destination selection policy (also called a load-balancing policy) is the mechanism whereby the NLB server determines which host to recommend as the target for work (such as a batch request). When a batch request is submitted to a load-balanced queue within NQS, the NLB server applies a policy to the current information it has on hosts in the group of NQE nodes in the NQE cluster. It then provides NQS with a list of machines, ordered according to the specifications of the policy.
An NQS pipe queue can be configured to obtain its destinations by applying a policy rather than having the destinations defined with qmgr commands. If you define an NLB pipe queue, you do not define any destinations for it, because the NLB selects destinations for you.
A load-balancing policy is a set of equations used to select hosts and sort them into an order based on the data held in the NLB.
NLB collectors periodically collect data about the machines on which they are running and forward this data to the NLB server. Some of the data on machine load and batch request status is gathered dynamically from the operating system. Other data can be read from a file by the collector or written directly into the NLB server using the nlbconfig program. These collectors reside in the program ccollect. (The ccollect program will reflect the port number.) For more information about NLB collectors, see Chapter 7, “NLB Administration”.
The NQE database provides the following:
A mechanism for the client user to submit requests to a central storage database.
A scheduler that analyses the requests in the NQE database and chooses when and where a request will run.
The ability for a request to be submitted to the selected NQS server once the NQE scheduler has selected it.
Figure 2-6 shows the basic concept of the NQE database in the NQE:
The components of the NQE database are described in the following sections.
The NQE database server (mSQL) serves connections from clients in the network or locally to access data in the database. The database used is mSQL, and it has the following features:
Simple, single-threaded network database server
Relational database structure with queries issued in a simple form of SQL
The monitor is responsible for monitoring the state of the database and which NQE database components are connected. Specifically, it determines which component processes are connected and alive and, optionally, it purges old requests (called tasks when they are within the database) from the database.
The scheduler analyses data in the database, making scheduling decisions. Specifically, it does the following:
Tracks the lightweight servers (LWSs) currently available for use in destination selection.
Receives new requests, verifies them for validity, and accepts or rejects them.
Executes a regular scheduling pass to find NQS servers to run pending requests.
Administrator-defined functions may be written in Tcl to perform site-specific scheduling (see Chapter 10, “Writing an NQE Scheduler”).
The LWS performs the following tasks:
Obtains request information from the database for requests assigned to it by the scheduler.
Checks authorization files locally.
Changes context and submits requests to NQS.
Obtains the exit status of completed requests from NQS and updates the NQE database with this information.
The following sections describe terms commonly used with the File Transfer Agent (FTA). FTA provides file transfer between remote systems on a network.
A file transfer service is a program that provides FTA with access to a network system that uses a specific file transfer protocol.
A unique FTA domain_name is assigned to each of the file transfer services that are available to FTA. A domain name is subsequently used to identify a specific transfer service.
FTA defines specific types of users, as follows:
Admin users: A user or a member of a group who is listed in the FTA configuration file as an administrator.
Status users: A user or member of a group who is listed in the FTA configuration file as a status user.
Network peer-to-peer authorization ( NPPA) lets users transfer files without sending a password across the network. It requires FTA on the local system and support for network peer-to-peer authorization on the remote system. It can be used to authorize both batch and interactive file transfers.
A file transfer request is an object containing information that describes the file transfer operations to be performed by FTA on behalf of a user. Each request is a separate file, which is located in the FTA queue directory.