When you install NQE, a default configuration is provided for you. You may not find it necessary to make many changes to the initial installation and configuration as it is described in the installation instructions. However, if you do need to make changes to your NQE configuration, this chapter describes how to make changes to the NQS configuration in the network and on the local host. All of these actions are performed using the qmgr utility. An NQS manager can also perform the daily operations of the NQS system as described in Chapter 6, “Operating NQS”.
This chapter discusses the following topics:
Configuration guidelines
Overview of manager actions
Defining the NQS hosts in the network
Defining managers and operators
Configuring the scope of the user status display
Defining NQS queues
Defining a default queue
Restricting user access to queues
Defining queue complexes
Defining global limits
Defining per-request and per-process limits
Defining job scheduling parameters
Defining the political scheduling daemon threshold
Unified Resource Manager (URM) for UNICOS systems
Miser scheduler for IRIX systems
NQS periodic checkpointing
Setting the log file
Setting the debug level
Establishing NQS accounting procedures
Setting the validation strategy for remote access
Defining other system parameters (such as the default time a request can wait in a pipe queue and the name of the user who sends NQS mail)
Using a script file as input to the qmgr utility, and generating a script file for the currently defined configuration
Output validation
NQS and multilevel security on UNICOS systems and on UNICOS/mk systems
NQS user exits for UNICOS and UNICOS/mk systems
You can use the qmgr(8) subsystem to configure the local NQS host to meet site-specific requirements. However, many of the NQS parameters are already configured with a chosen default value and may not require modification.
Usually, NQS configuration is done only once, when NQS is started for the first time. This configuration is preserved through subsequent shutdowns and startups. You can change the configuration at any NQS startup by providing a new configuration file as input to the qstart -i command or the nqeinit -i command (see the qstart(8) and nqeinit(8) man pages for more information).
You should consider the following guidelines when configuring NQS:
Abbreviated forms of commands can be used with qmgr; however, you should use the full form in standard configuration scripts to avoid confusion.
If you do not use load balancing, you can provide better control by defining batch queues as pipeonly and creating pipe queues that have batch queue destinations.
Avoid configuring pipe queues that have other local pipe queues as destinations, which can create a loop. No check exists for an occurrence of this in NQS.
When a request is forwarded from a nonload-balancing pipe queue, the destinations are considered in the order you configured them by using the qmgr add destination command. A request is accepted into the first batch queue with per-process and per-request limits that are not exceeded by the request. Therefore, always set the order for pipe queue destinations and batch queue limits so that a batch request is accepted by the most appropriate queue.
The scheduler weighting factors have significant value only when they are compared with each other. See “Intraqueue Priority” in Chapter 2, for a description of intraqueue priority and “Defining Job Scheduling Parameters”, for a description and examples of the set sched_factor commands.
When defining checkpoint_directory, consider whether a specific device has sufficient space available to save the checkpoint images of all currently executing batch requests.
The absolute_pathname that you select as the checkpoint directory must be an absolute path name to a valid directory that has only owner read, write, and search permissions (NQS sets the permission bits to 0700). The directory named by absolute_pathname must also be retained across UNICOS system restarts (you should not use the /tmp directory) and should be dedicated to NQS checkpoint files. The checkpoint files that are created are used to restart the NQS jobs when NQS restarts.
An NQS manager is responsible for the configuration of an NQS system. An NQS manager also can perform all of the operator actions; see Chapter 6, “Operating NQS”, for a description of operator actions.
The actions for managing a configuration include using the qmgr utility to do the following:
Define the NQS systems in the network
Define the users who can be NQS managers or operators
Create NQS queues
Restrict user access (if desired) to NQS queues, and set the type of user validation to authenticate the users' identities
Define other system parameters such as the name of the default NQS queue, the name of the log file, and the retry limits to use when an attempt to send a request on to another system fails
After the configuration is initially defined, an NQS manager can change it by using the qmgr utility. Changes can be made to any of the parameters when the NQS system is running without stopping and restarting the system.
![]() | Note: Except for defining the list of NQS systems in the network, the nqsdaemon must be started before you can define or change any other configuration items; see “Starting NQS” in Chapter 6, for a description of how to start nqsdaemon. |
Each NQS system in the network must know about the other NQS systems in the network. If NQS will not be communicating with any remote hosts, you need only the local host defined. Typically, the qmgr commands used to define hosts appear in the NQS configuration file.
Because each NQS server in a TCP/IP network can have multiple host names and Internet addresses (usually one for each network to which the server is connected or for each interface the server has), it may not be sufficient to uniquely identify the server by host name and Internet address. Therefore, NQS uses machine IDs (mids) to define each server in the network.
When NQS tries to connect between two hosts, it verifies that the name provided by the TCP/IP protocol has the same name and a corresponding mid on both hosts.
The hostname or alias associated with the mid is used to obtain the network address from the /etc/hosts file. When NQS detects a connection from another machine, it is given the peer network address, which is then located in the /etc/hosts file, yielding the network path corresponding to an alias or hostname in the network database. Some systems may store host name information on NIS, NIS+, or DSN. The network path is then used to verify the mid of the peer. If the mids and names do not match, the connection is refused.
Table 5-1 describes the elements in each host definition.
Table 5-1. Information for defining machines
Item | Description |
---|---|
mid | A machine ID number in the NQS database that has a maximum value of 231-1 and is unique for each machine in the network. If you do not supply a mid, NQS creates one for you from the lower 31 bits of the TCP/IP address of the host. |
hostname | A unique name for each machine in the network; the name specified must correspond to the machine's entry in the /etc/hosts file (see hosts(5) for more information). |
agents | Agents available to spool output on the machine. The following agents are supported (see “Configuring Remote Output Agents”):
|
alias | A local machine name or another network interface that consists of a maximum of 255 characters. You must define all network interfaces or requests will be rejected. |
If you are configuring more than one NQS server, you must add NQS mids to each NQS database. NQE clients do not need mids.
The mid and host name must be the same in the database of each machine in the network. Aliases must be unique to each machine in the local configuration, but may differ from server to server.
![]() | Note: After the NQS database is generated, the mid for the local machine should not be changed. If a change is required, the local NQS database must be re-created and all requests are lost. |
The agents element defines how the standard error and standard output files of completed requests will be returned to the submitting user. For more information, see “Configuring Remote Output Agents”.
A machine alias is known only to the local server. However, a server can be known by more than one alias. The alias list may include host names associated with any interface on the local host. An alias is required for each possible route between hosts.
You can set the output agent for each NQS host defined in the NQS database. The output agent at each host defines how the standard error and standard output files of completed requests are returned to the submitting user.
![]() | Note: Output agents are not configured for clients. Client output return uses the File Transfer Agent (FTA), then RCP. NQS always uses FTA with the default domain, which is inet. For information about FTA and network peer-to-peer authorization (NPPA) of users, see Chapter 13, “FTA Administration”. |
When you specify nqs (the default output agent), NQS tries to connect directly to the remote NQS host to transfer the files. If any errors occur during the processing, the output is returned to the user's home directory on the execution host. If for some reason, this is not possible (for example, if the owner does not have write permission), NQS will place the files in $NQE_NQS_SPOOL/private/root/failed. NQE_NQS_SPOOL is defined in the nqeinfo file.
When you specify fta, NQS passes control of the output file transfer to FTA by creating an FTA file transfer request to move the output file to the remote host. When you specify fta as the output agent, NQS validation is used; for detailed information, see Chapter 13, “FTA Administration”, and the nqsfts(8) man page. If any errors occur during the processing of the output file transfer, FTA will save the transfer request and try to transfer it again, or return the output to the user's home directory on the execution host. If the error is transient (for example, the network failed during the file transfer), FTA retries the transfer request.
You can use the following qmgr commands to add or delete the output agents, respectively. You must include either the machine ID (mid) or the host name:
add output_agent agent mid|host delete output_agent agent mid|host |
The order in which you add output agents is important. The order in which you add the agents becomes the order in which the agents are used to attempt transfer.
To display current configuration, use either of the following qmgr commands:
show mid show mid hostname|mid |
In the resulting display, the output agents are shown in the column called AGENTS.
The following list shows the output agents that you can specify and their descriptions:
Agent | Description | ||
<NONE> | NQS immediately places the output files in the user's home directory on the host that executed the request. The user receives a mail message when the files are returned. To set the agent to <NONE>, use the following command:
| ||
fta | NQS creates an FTA file transfer request to transfer the output file to the remote host. | ||
fta nqs | NQS tries to create an FTA file transfer request to handle the output file transfer. If FTA fails, NQS tries to connect to the NQS system on the remote host and directly transfer the request output file. | ||
nqs | NQS tries to connect to the NQS system on the remote host and directly transfer the request's output file. This option directs NQS to handle the output file transfer directly. This is the default agent. | ||
nqs fta | NQS tries to connect to the NQS system on the to the remote host and directly transfer the request's output file. If NQS fails, NQS tries to create an FTA file transfer request to handle the output file transfer. |
The nqs output agent is configured automatically unless you configure the output agent differently.
![]() | Note: In the case that all transfer agents fail to return output, the output is placed in the user's home directory on the host that executed the request. This is the same as configuring with output agent <NONE>. |
An output agent should always be configured for the local host to avoid output being inadvertently returned to the user's home directory on the host of execution; this is usually done by specifying the nqs agent.
See Chapter 13, “FTA Administration”, and the fta(8), nqsfts(8), and fta.conf(5) man pages for details about how to configure FTA. NQS uses the FTA domain name nqs-rqs to spool output over TCP/IP.
You can use the following qmgr command to add a new entry to the mid database (see Table 5-1, for an explanation of mid, hostname, agent, and alias). You do not have to specify a mid; if you do not, qmgr creates one for you from the lower 31 bits of the TCP/IP address for the host. An optional list of aliases may be supplied on the same line. You do not enter the commands on client systems or issue add mid commands for client machines.
add mid [mid] hostname[alias alias ...] |
The following command adds an alias name to the list of aliases for the system with the specified mid or hostname:
add name [=] alias mid|hostname |
The following command adds an output agent to the list of output agents for the system with the specified mid or hostname.
add output_agent [=] agent mid|hostname |
See “Configuring Remote Output Agents”, for more information on output agents.
A similar series of delete commands can be used to delete an entry completely from the list, delete an alias name from an entry, and delete an output agent from an entry. You can specify either a mid or a hostname:
delete mid mid| hostname delete name [=] alias delete output_agent [=] agent mid|hostname |
To display information about a specific mid or hostname, use the following qmgr command:
show mid [mid|hostname] |
To display information about all known mids, use the following qmgr command:
show mid |
See “Configuring Remote Output Agents”, for an example display.
Figure 5-1 shows an example of a network that consists of four machines, each with two paths. In the example, hn refers to host name, and na refers to network address.
In this example, the /etc/hosts file contains the following:
128.162.12.0 rain rain-ip 128.162.10.1 rain-ec 128.162.11.2 squall squall-ip 128.162.10.2 squall-ec 128.162.11.3 gale gale-ip 128.162.10.3 gale-ec 128.162.11.4 gust gust-ip 128.162.10.4 gust-ec |
Table 5-2 shows the contents of the NQS network database for the previous example:
Table 5-2. Machine IDs, names, and output agents
Machine ID | Principal name | Agents | Aliases |
---|---|---|---|
10619649 | rain | nqs fta | rain-ip, rain-ec |
10619650 | squall | nqs fta | squall-ip, squall-ec |
10619651 | gale | nqs | gale-ip, gale-ec |
10619652 | mist | nqs | mist-ip, mist-ec |
The qmgr commands used to create the database in the preceding example are as follows (you may supply your own mid value):
add mid rain rain-ip rain-ec add output_agent fta rain add mid squall squall-ip squall-ec add output_agent fta squall add mid gale gale-ip gale-ec add mid mist mist-ip mist-ec |
To create the database on each NQS server, you would enter these commands on each.
You can define two special classes of users: managers and operators. NQS qmgr managers have full access to all qmgr commands to monitor, control, and configure NQS. Operators are allowed to monitor and control all queues and requests. The only qmgr commands available to users without special privileges are show, help, exit, and quit.
When NQS is initially installed, root must run qmgr first to define other users as either an NQS manager or an operator. root cannot be deleted from the list of NQS managers. On a UNICOS system that is running MLS or on a UNICOS/mk system that is running Cray ML-Safe and that has a PAL assigned to the qmgr program, people in the default qmgr PAL categories can also define other users as either an NQS manager or operator.
Managers can execute all qmgr commands; operators can execute only a subset of the qmgr commands that allow them to monitor and control submitted batch requests when they are in local NQS queues. An operator cannot define another user as an operator or as a manager. Only a manager can add and delete managers or operators.
The following qmgr commands add a user name to the list of managers or list of operators, respectively, for the local NQS system:
add managers username:m add managers username:o |
The suffix indicates that the user is a manager (:m) or an operator (:o).
If you want to set the list of NQS managers or NQS operators to a single user name (in addition to root), you can use one of the following two commands because any users defined previously as managers or operators (except root) are removed from the list.
set managers username:m set managers username:o |
To delete a user name from the list of managers or operators, use one of the following commands.
delete managers username:m delete managers username:o |
To display the currently defined managers and operators, use the following qmgr command:
show managers |
See “Displaying a List of Managers and Operators” in Chapter 6, for an example display.
NQE allows the administrator to expand the default scope of the information displayed to non-NQS managers using the qstat(1) or cqstatl(1) commands. The default behavior for these commands is to only display information to non-NQS managers regarding jobs that they have submitted themselves. To allow users to display the status of all jobs residing at that NQS node when they execute the qstat(1) or cqstatl(1) command, use the nqeconfig(1) command and define the nqeinfo file variable NQE_NQS_QSTAT_SHOWALL to be 1.
![]() | Note: NQE must be stopped and restarted for this change to take effect. |
NQS has both batch and pipe queues. Batch queues accept and process batch requests; pipe queues receive batch requests and then route those requests to another destination. “Queues” in Chapter 2, provides background information about queues and how they work.
NQE is shipped with one pipe queue and one batch queue configured on each NQS server. The default pipe queue is nqenlb . The default batch queue is nqebatch. This configuration uses the nqenlb queue to send requests to the NLB, which then sends requests to nqebatch on the most appropriate system, based on an NLB policy. The default NLB policy, called nqs, sends batch requests to the system with the most available CPU cycles. If requests are submitted to the queue nqebatch, the request runs on the local NQS server.
To create batch or pipe queues, use the following qmgr commands:
create batch_queue characteristics create pipe_queue characteristics |
Table 5-3 shows the characteristics you can define for an NQS queue.
Table 5-3. Definable queue characteristics
Characteristic | Description | Queue type |
---|---|---|
queue_name | Each NQS queue must have a name that is unique among the other NQS queues defined at this host. This is the name users use when submitting requests. The queue name can consist of a maximum of 15 characters, and it can consist of any printable character, except @, =, (, ), ], and the space character. The name cannot begin with a number (0 through 9). | Batch and Pipe |
destination | Each NQS pipe queue must have one or more destinations, which can be either another NQS pipe or batch queue. | Pipe |
| The destination queue can be in one of the following forms: | |
| local_queuename | |
| [email protected]_hostname | |
| [email protected]_hostname | |
| [email protected][remote_mid] | |
| Any hostname must be defined in the NQS configuration machine ID (mid) database. For a remote queue, the last of the preceding forms lets you specify the mid rather than the host name (for example, [email protected][23]). The mid must be enclosed within brackets. | |
run_limit | By default, a queue can process only one request at a time (although any number may be waiting in the queue). You can increase this limit by defining a run limit for a queue. This limit sets the maximum number of requests that can be processed in the queue at any one time. | Batch and Pipe |
interqueue_priority | Each queue has an interqueue priority. This defines the order in which queues are searched to find the next request to process. The queue with the highest priority is scanned first. The priority is an unsigned integer in the range from 0 (lowest priority) through 63 (highest priority). Requests also have an intraqueue priority that a user can specify when submitting a request. This is different from interqueue priority. The request intraqueue priority determines the order in which requests are considered for initiation when a queue is searched. | Batch and Pipe |
loadonly qualifier | By default, an NQS batch queue can receive requests either directly from a user submitting the request or from a pipe queue. A batch queue can be declared loadonly, meaning that it can accept only a limited number of requests and requests must be routed through a local or remote pipe queue. The number of requests that can be queued is restricted to the runlimit of that queue (the runlimit is the number of requests that can be processed). Therefore, if there are no other limits to consider, a loadonly queue accepts only the number of requests that it can run. See “Batch Queue Attributes” in Chapter 2, for a description of how loadonly queues can be used. | Batch |
pipeonly qualifier | If a queue has the characteristic pipeonly, a user cannot submit a request directly to the queue. The queue can receive only requests sent from a local NQS pipe queue. | Batch and Pipe |
server | NQS pipe queues can be associated with a server from which they will receive requests. This server is an image the pipe queue runs to do work. For example, you must define a server when you create pipe queues for load balancing. If you specify pipeclient, NQS looks for the binary in the NQE_bin directory as configured in the /etc/nqeinfo file. If the pipeclient is not in that directory, then you must specify the absolute location. It is recommended that you use pipeclient. A pipe queue can be defined as a destination-selection (load-balancing), as described in “Commands for Configuring Destination-selection Queues”. When you define a destination-selection queue, you must include, as part of the server definition, the characters CRI_DS in uppercase, the load-balancing policy name (the default is nqs), and the TCP/IP host and service name on which the NLB server(s) are running (the default is the NLB_SERVER defined in the nqeinfo file). | Pipe |
These are the basic characteristics of an NQS queue. You also can define other controls on the use of a queue, as follows:
You can define a list of users who have permission to use a queue. Requests belonging to a user who is not on this list cannot be placed in the queue. See “Restricting User Access to Queues”.
You can specify global system defaults for queues. See “Defining Global Limits”.
You can specify limits on the number and size of the requests in a queue, as described in “Setting Queue Limits”.
To specify limits on the requests running in a queue, use the following qmgr command:
set queue option = limit queuename |
An NQS operator or manager can executed this command. The option can be one of the following:
Option | Description | |
group_limit | Sets the maximum number of requests allowed to run in the queue at one time from users in one group. | |
memory_limit | Sets the maximum amount of memory that can be requested by all running requests in the queue at one time. | |
run_limit | Sets the maximum number of batch requests that can run in a queue concurrently. You also can set this limit by using the command create batch queue run_limit. | |
user_limit | Sets the maximum number of requests allowed to be run in the queue at one time by one user. | |
mpp_pe_limit | Sets the maximum number of MPP processing elements (PEs) that are available to all batch requests running concurrently in a queue. | |
quickfile_limit | Sets the quickfile space limit for a queue. The limit determines the maximum quickfile space that can be allocated to all batch requests running concurrently in a queue. The queue must exist already and be a batch type queue. |
Some queue limits are enforced only for certain machines. For more information on these limits, see Table 5-4.
The NQS daemon must be started before you can create or change the characteristics of a queue because nqsdaemon is the process that manipulates queues.
The destinations for an NQS pipe queue are usually NQS batch queues, although they could be any type of queue, such as a pipe queue on a specific remote machine.
Before a queue can receive requests, it must be enabled by using the qmgr command enable queue (see “Enabling and Disabling Queues” in Chapter 6). Enabling a queue means that requests can be placed in the queue. Before a queue can process its requests, the queue must be started using the qmgr command start queue (see “Starting and Stopping Queues” in Chapter 6).
Avoid configuring pipe queues that have other local pipe queues as destinations, which can create a loop. NQS has no check for this situation.
The first time a queue is defined, it must be created by using one of the following qmgr commands:
create batch_queue queuename \ priority = queue-priority[loadonly] [pipeonly] \ [run_limit = run-limit] create pipe_queue queuename \ priority = queue-priority[pipeonly] \ [destination = (destinations)] \ [run_limit = run-limit] \ [server = (pipeclient_and_options)] |
See Table 5-3, for a description of these parameters. The queuename argument and the priority parameter are required. A pipe queue (unless it is a destination-selection (load-balancing) queue) also must have at least one destination defined by using this command, the add destination command, or the set destination command.
The destinations argument can be one or more destinations; destinations must be separated with a comma.
![]() | Note: See “Commands for Configuring Destination-selection Queues”, for information on configuring a destination-selection queue. |
To add destinations to those already defined for a pipe queue, use the following command:
add destination [=] destinations queuename \ [position] |
The destinations argument is one or more destinations to be added to the list of destinations for queuename. If you specify more than one destination, they must be enclosed in parentheses and separated with a comma. If you omit the position argument, the destinations are added at the end of the existing destinations defined for queuename.
The position argument can be one of the following:
Position | Description | |
first | Places the specified destinations at the start of the existing list | |
before old-des | Places the specified destinations in the list immediately before the existing destination old-des | |
after old-des | Places the specified destinations in the list immediately after the existing destination old-des | |
last | Places the specified destinations at the end of the existing list |
Destinations are not used in destination-selection (load-balancing) queues.
![]() | Note: A pipe queue's destination list should not contain any elements that are disabled batch queues. In the event that this does occur, any jobs that are submitted to the pipe queue remain stuck in the pipe queue if they cannot be sent to any of the other destinations in the list. Because the disabled batch queue exists, NQS waits for the queue to become enabled or for it to be deleted before it moves the job from the pipe queue. To ensure that jobs are not sent to a particular destination in the pipe queue's list, remove the destination from the list rather than disabling the batch queue. |
If you want to change the destinations for a queue, you can use the following command:
set destination [=] destinations queuename |
If you include more than one destination, separate them with a comma.
This command discards all the previous destinations.
To delete one or more destinations from the list of those defined for a pipe queue, use the following command:
delete destination [=] destinations queuename |
Separate the destinations with a comma.
Any requests in the queue that are currently being transferred to a destination that is deleted by this command are allowed to transfer successfully.
Requests can be submitted to a queue that has no destinations, but the requests are not routed until a destination is defined for the queue.
To change the interqueue priority for an existing queue, use the following command:
set priority = priority queuename |
To change the run limit for an existing NQS queue, use the following command:
set queue run_limit = run-limit queuename |
Load balancing is described in Chapter 8, “Implementing NLB Policies”. The pipe queue nqenlb is defined by default to accept requests that use load balancing. The following qmgr command creates a destination-selection (load-balancing) pipe queue named nlbqueue:
create pipe_queue nlbqueue priority=priority \ server=(pipeclient CRI_DS [policyname][hostname:port]) |
The arguments unique to creating a destination selection queue are as follows:
The name of the executable for the pipe client server (pipeclient).
The characters CRI_DS in uppercase, which mark this as a destination-selection pipe queue.
The load-balancing policy name (such as nqs). This policy must be defined in the NLB policies file. If this field is omitted, the policy name defaults to nqs.
An optional list of 1 to 8 host:port pairs that define the TCP/IP host and service name where the NLB server(s) are running. If this field is omitted, the NLB database server name defaults to the hosts configured in the NLB. The NLB_SERVER is defined in the nqeinfo file. There should be one pair for each NLB server from which you want to gather host information. Only one is necessary, but if servers are replicated for redundancy in case of network failure, all servers should be included. In the example, hostname is the host name where the server resides and port is a service name (from the /etc/services file) or a port number on which the server is listening.
After the qmgr create command is issued, users can submit requests for destination selection to the queue nlbqueue.
If you want this new queue to be the default for requests, use the following qmgr command:
set default batch_request queue nlbqueue |
You must use the cqstatl or qstat command (instead of the qmgr utility) to display details about an NQS queue; for example:
cqstatl -f queues |
See “Displaying All the Characteristics Defined for an NQS Queue” in Chapter 6, for an example display.
In the following example, an NQS pipe queue is required that routes requests to a remote NQS system called host1. A maximum of two requests can be processed in the NQS pipe queue at any one time. The following qmgr commands are required:
% qmgr Qmgr: create pipe_queue standard priority=60 server=(pipeclient) Qmgr: set queue run_limit=2 standard Qmgr: add destination [email protected] standard Qmgr: add destination [email protected] standard Qmgr: add destination [email protected] standard Qmgr: quit% |
This queue could also be defined in one command, as follows:
% qmgr Qmgr: create pipe_queue standard priority=60 server=(pipeclient) destination=([email protected],[email protected],[email protected]) run_limit=2 |
The order of the destinations is important because this is the order in which NQS tries to route a request. In this example, if you want to move st_little to the beginning of the destination list, you could type in the following two qmgr commands:
delete destination [email protected] standard add destination [email protected] standard first |
To redefine all the destinations for the queue, use the following qmgr command:
set destination=([email protected], [email protected], [email protected]) standard |
After a queue has been stopped and disabled, delete it by using the following qmgr command:
delete queue queuename |
If the queuename argument is also the default queue, you should define a new default queue (see “Defining a Default Queue”).
To define an NQS queue to be the default queue, use the following qmgr command:
set default batch_request queue queuename |
The queuename argument can be the name of any NQS batch queue that you have already created.
To change the definition of the default queue so that there is no longer a default queue, use either of the following qmgr commands:
set no_default batch_request queue set default batch_request queue none |
To display the current default queue, use the following qmgr command:
show parameters |
The setting is shown next to Default batch_request queue. See “Displaying System Parameters” in Chapter 6, for an example display.
When you initially create an NQS queue, any user can submit a request to it. To restrict user access to a queue, create a list of the users and a list of the groups whose members can access the queue. These lists are held in the NQS configuration database and NQS manager can edit them by using qmgr commands.
For a user to have access to a queue, one of the following requirements must be true:
Access to the queue is unrestricted (this is true when a queue is first created or when the qmgr command set unrestricted_access was issued for the queue).
The user belongs to a group that was added to the list of groups that can access the queue by using the qmgr command add groups.
The user was added to the list of users who can access the queue by using the qmgr command add users. root does not have automatic access to all queues.
If the access to the queue is unrestricted, you cannot use the add groups or add users command. You must first set the queue to have no access by using the qmgr command set no_access.
To restrict the access to the queue, use qmgr commands, as follows:
Define the queue as having no access by any user, using the following qmgr command:
set no_access queuename |
Add individual users, or user groups, to the list of users or groups that can submit requests to the queue, as follows:
add users = (user-names) queuename add groups = (group-names) queuename |
The user-names argument is one or more user names; group-names is one or more group names. If you specify more than one user name or group name, you must enclose all the names in parentheses and separate them with a space or a comma. Use the numeric UNIX user IDs and group IDs (which must be enclosed in brackets []) as an alternative to user names and group names.
If you later want to allow any user to access the queue, enter the following qmgr command:
set unrestricted_access queuename |
To delete users or groups from the list of those allowed to access a queue, use the following qmgr commands:
delete users = (user-names) queuename delete groups = (group-names) queuename |
To display the current restrictions on user and group access to a queue, use the cqstatl or the qstat command instead of a qmgr utility; for example:
cqstatl -f queues |
The access restrictions appear on the display under the heading <ACCESS>.
See “Displaying All the Characteristics Defined for an NQS Queue” in Chapter 6, for an example display.
In the following example, you want to restrict access to an NQS queue called standard. Unless the access permissions to a queue have been changed, any user can access it. You first issue the set no_access command to restrict all users, and then add those groups you want to have access. To restrict access to only those users belonging to a UNIX user group called research, enter the following two qmgr commands:
set no_access standard add groups research standard |
To create a queue complex (a set of batch queues), use the following qmgr command:
create complex = (queuename(s)) complexname |
To add or remove queues in an existing complex, use the following qmgr commands:
add queues = (queuename(s)) complexname remove queues = (queuename(s)) complexname |
![]() | Note: The difference between the commands remove queues and delete queues is important. The remove queues command removes a queue from the queue complex, but the queue still exists. The delete queues command deletes the queue completely. |
After a complex has been created and the appropriate queues added to it, associate complex limits with it by using the following qmgr command:
set complex option=limit complexname |
The option provides for control of the total number of concurrently running requests in the queues on the local host. option may be one of the following:
Option | Description | |
group_limit | Sets the maximum number of requests that all users in a group can run concurrently in all queues in the complex | |
memory_limit | Sets the maximum amount of memory for all requests running concurrently in all queues in the complex | |
mpp_pe_limit | Each batch queue has an associated MPP PE limit. This is the maximum number of MPP PEs that can be requested by all running jobs in the queue at any one time. | |
quickfile_limit | (UNICOS systems with SSD solid state storage devices) Each batch queue has an associated quick-file limit. This is the maximum size of secondary data segments that can be requested by all running jobs in the queue at any one time. | |
run_limit | Sets the maximum number of requests allowed to run concurrently in all queues in the complex | |
user_limit | Sets the maximum number of requests that any one user is allowed to run concurrently in all queues in the complex |
![]() | Note: A batch request is considered for running only when all limits of all complexes of which the request is a member have been met. |
NQS managers and operators can set queue complex limits. Some complex limits are enforced only on certain machines. For more information on what limits are enforced, see Table 5-4.
The following example creates a queue complex named weather that contains the bqueue11, bqueue12, and bqueue13 queues:
create complex=( bqueue11,bqueue12,bqueue13) weather |
The following example limits all users in one group to a maximum of 20 requests in the queue complex weather:
set complex group_limit=20 weather |
To set limits on the total workload executing concurrently under NQS control on the local host, use the following qmgr command:
set global option=limit |
Queue limits restrict requests in queues and complex limits restrict requests in a complex. Global limits restrict the activity in the entire NQS system. NQS managers and operators can set global limits.
The option can be one of the following:
Option | Description | |
batch_limit | Sets the maximum number of batch requests allowed to run concurrently at the host. | |
group_limit | Sets the maximum number of batch requests all users in a group can run at the host. | |
memory_limit | Sets the maximum amount of memory that can be allocated to all batch requests running concurrently at the host. | |
mpp_pe_limit | The maximum number of MPP PEs that can be requested by all batch requests running concurrently under NQS control. | |
pipe_limit | Sets the maximum number of pipe requests allowed to run concurrently at the host. | |
quickfile_limit | (UNICOS systems with SSDs) The maximum amount of secondary data segment space that can be requested by all batch requests running concurrently under NQS control. | |
tape_drive_limit | Sets the maximum number of tape drives per device group that can be requested by all batch requests running concurrently at the host. NQS supports eight tape groups (see the qmgr(8) man page). | |
user_limit | Sets the maximum number of batch requests that any one user is allowed to run concurrently at the host. |
Some global limits are enforced only on certain machines. For more information on what limits are enforced, see Table 5-4.
To display the current global limits, use the following qmgr command:
show global_parameters |
The resulting display resembles the following:
Global batch group run-limit: 20 Global batch run-limit: 30 Global batch user run-limit: 20 Global MPP Processor Element limit: unspecified Global memory limit: unlimited Global pipe limit: 20 Global quick-file limit: unlimited Global tape-drive a limit: unspecified Global tape-drive b limit: unspecified Global tape-drive c limit: unspecified Global tape-drive d limit: unspecified Global tape-drive e limit: unspecified Global tape-drive f limit: unspecified Global tape-drive g limit: unspecified Global tape-drive h limit: unspecified |
You also can see these limits in the show parameters display. See “Displaying System Parameters” in Chapter 6, for an example. Some parameters are not enforced if the operating system does not support the feature, such as MPP processor elements.
To set per-process and per-request limits on batch queues, use the following qmgr commands:
set per_process option = limit queuename set per_request option = limit queuename |
Per-request limits apply to the request as a whole; they limit the sum of the resources used by all processes started by the request. Per-process limits apply to individual processes started by a request (including the parent shell and each command executed). For example, the per-request memory size limit is a limit on the sum total of memory used by all processes started by the request; the per-process memory size limit is the maximum amount of memory that each process can use.
Per-process and per-request limits on a queue are compared with the limits set on a request. If the request's limits are lower, it can enter the queue. If the limits are higher, the request cannot enter the queue. If requests have improperly set limits and enter queues with lower limits than the request actually needs, the request may abort when it attempts to exceed limits imposed by the queue.
Table 5-4, shows the limits that can be enforced on NQS requests. This table uses the following notation:
Symbol | Meaning |
Y | Limit is supported and enforced on this platform |
N | Limit is not supported or enforced on this platform |
PP | Limit is a per-process limit |
PR | Limit is a per-request limit |
![]() | Note: The limits included in Table 5-4 can be set by using the qsub command option listed in the table or by using the NQE GUI Job Limits submenu of the Configure menu on the Submit window. For detailed information about the command options, see the qsub(1) man page. For detailed information about using the NQE GUI Job Limits submenu of the Configuration menu, see the NQE User's Guide, publication SG-2148. |
Table 5-4. Platform support of NQS limits
Run-time enforcement on executing NQS | ||||||||
---|---|---|---|---|---|---|---|---|
Description | Limit type | qsub qualifier | UNICOS/mk and UNICOS | Solaris | AIX | DEC | HP-UX | IRIX |
Core file size | PP | -lc | Y | Y | Y | Y | N | Y |
Data segment size | PP | -ld | N | Y | Y | Y | N | Y |
Permanent file size | PP | -lf | Y | Y | Y | Y | N | Y |
| PR | -lF | Y | N | N | N | N | N |
Memory size | PP | -lm | Y | N | N | N | N | Y |
| PR | -lM | Y | N | N | N | N | Y[b] |
MPP memory size | PP | -1 p_mpp_m | Y | N | N | N | N | N |
| PR | -1 mpp_m | Y | N | N | N | N | N |
Nice value | PP | -ln | Y | Y | Y | Y | Y | Y |
Quick file size (not on UNICOS/mk) | PP | -lq | Y | N | N | N | N | N |
| PR | -lQ | Y | N | N | N | N | N |
Stack segment size | PP | -ls | N | Y | Y | Y | N | Y |
CPU time | PP | -lt | Y | Y | Y | Y | N | Y |
| PR | -lT | Y | Y[a] | Y[a] | Y[a] | N | Y[b] |
Tape drive limit | PR | -lU | Y | N | N | N | N | N |
Temporary file size | PP | -lv | N | N | N | N | N | N |
| PR | -lV | Y | N | N | N | N | N |
Working set size | PP | -lw | N | Y | Y | Y | N | Y |
MPP processing elements (T3D systems) or MPP application processing elements (T3E systems) or Number of processors (IRIX systems) | PR | -l mpp_p | Y | N | N | N | N | Y[b] |
MPP residency time (T3D systems) or CPU time for application PEs (T3E systems) | PP | -l p_mpp_t | Y | N | N | N | N | N |
| PR | -l mpp_t | Y | N | N | N | N | N |
Shared memory size | PR | -l shm_limit | Y | N | N | N | N | N |
Shared memory segment size | PR | -l shm_segments | Y | N | N | N | N | N |
[a] The per-request limit is enforced as if it were per-process. [b] Optionally enabled. See “Enabling per Request Limits on Memory Size, CPU Time, and Number of Processors on IRIX Systems”. |
The following options to the qmgr commands set per_process and set per_request set per-process and per-request limits on NQS queues.
Option | Description | |
corefile_limit | Sets the maximum size (in bytes) of a core file that may be created by a process in the specified batch queue. | |
cpu_limit | Sets the per-process or per-request maximum CPU time limit (in seconds) for the specified batch queue. On CRAY T3E systems, this CPU time limit applies to command PEs (commands and single-PE applications); for CRAY T3E applications PEs (multi-PE applications), maximum CPU time is set by using the set per_process and set per_request mpp_limit command. The value can be specified as seconds, as minutes:seconds or as hours:minutes:seconds. | |
data_limit | Sets the maximum size (in bytes) for the data segment of a process in the specified batch queue. This defines how far a program may extend its break with the sbrk () system call. | |
memory_limit | Sets the per-process or per-request maximum memory size limit for the specified batch queue. | |
mpp_memory_limit | Sets the per-process or per-request MPP application PE memory size requirements for the specified batch queue. | |
mpp_pe_limit | Sets the MPP processing element (PE) limit for a batch queue against which the per-request MPP PE limit for a request is compared. | |
mpp_time_limit | Sets the CRAY T3D MPP per-process or per-request wall-clock residency time limit or sets the CRAY T3E MPP per-process or per-request CPU time limit for application PEs (multi-PE applications) for the subsequent acceptance of a batch request into the named batch queue. | |
permfile_limit | Sets the per-process or per-request maximum permanent file size limit (in bytes) for the specified batch queue. The per-process request limit is the maximum size of files created by all processes on any type of directory. The per-request limit is the maximum amount of disk space used by all processes that make up an NQS request. | |
quickfile_limit | Sets the per-request maximum quickfile space for a batch queue against which the per-request quickfile space limit for a request is compared. | |
shm_limit | Sets the maximum per-request shared memory size limit for a batch queue against which the per-request shared memory size limit for a request is compared. | |
shm_segment | Sets the maximum per-request shared memory segment limit for a batch queue against which the per-request shared memory segment limit for a request is compared. | |
stack_limit | Sets the per-process maximum size (in bytes) for a stack segment in the specified batch queue. This defines how far a program's stack segment may be extended automatically by the system. | |
tape_limit | Sets the per-request maximum tape devices of the specified group for the specified batch queue. | |
tempfile_limit | Sets the per-request maximum temporary file space for an NQS batch queue; not enforced on Cray NQS but available for compatibility with other versions of NQS. | |
working_set_limit | Sets the maximum size (in bytes) of the working set of a process in the specified batch queue. This imposes a limit on the amount of physical memory to be given to a process; if memory is limited, the system may take memory from processes that are exceeding their declared working set size. |
To check the per-process and per-request limits assigned to a queue, use the cqstatl -f or qstat -f command; for example:
cqstatl -f queuename |
The limits appear under the characters <REQUEST LIMITS>. There is one column for per-process limits and one for per-request limits. See the next section for a partial example, or “Displaying All the Characteristics Defined for an NQS Queue” in Chapter 6, for a full example display.
The following qmgr commands set per-process limits for CPU time, permanent file space, and memory:
set per_process cpu_limit = 10:0 bqueue15 set per_process permfile_limit = 1Mb bqueue15 |
When you set the limits to 10 minutes, as in this example, that value appears in the cqstatl or qstat command display as 600 seconds.
The following qmgr commands set per-request limits for CPU time and permanent file size:
set per_request cpu_limit = 10:20 bqueue15 set per_request permfile_limit = 1Mb bqueue15 |
These commands result in the following CPU Time Limit and Memory Size fields in the cqstatl output:
<RESOURCES> PROCESS LIMIT REQUEST LIMIT CPU Time Limit <600sec> <620sec> Memory Size <256mw> <256mw>... |
Some parameters are not enforced if the operating system does not support the feature; see Table 5-4, for a list of limit enforcements.
Request limits are set to the lower of the user-requested limit or the queue limit you set when you created the queue. If a user does not specify a limit, the request inherits the limit set on the queue.
For example, if a user specifies a per-request memory limit, NQS tries to send it to a queue that allows the specified limit. If the same user does not specify a per-process memory limit, the request inherits the per-process memory limit of the queue to which it is sent (even though it was not specified on the command line).
Per request limits on memory size, CPU time, and number of processors must be enabled on IRIX systems. These limits are not turned on by default. To enable checking of these limits, add NQE_NQS_JL_INTERVAL =”seconds” to the nqeinfo(5) file. The ”seconds” value specifies the number of seconds between limits checking operations. Values of 15 to 120 seconds are recommend, if limits are enabled. Low interval values produce more accurate checking. High interval values generate less overhead.
To set the weighting factors that will determine the intraqueue priority for runnable requests, use the following qmgr commands:
set sched_factor share set sched_factor cpu set sched_factor memory set sched_factor time set sched_factor mpp_cpu set sched_factor mpp_pe set sched_factor user_priority |
All requests in a specific queue are ranked according to four criteria; the following weighting factors determine the importance of each criteria:
Factor | Description | |
share | Fair share (UNICOS and UNICOS/mk systems only) | |
cpu | Per-request CPU limit requested | |
memory | Per-request memory limit requested | |
time | Time the request has been waiting in the queue for initiation | |
mpp_cpu | Per-request MPP application PE CPU limit requested | |
mpp_pe | Per-request number of MPP application PEs requested | |
user_priority | Per-request user specified priority requested |
Each value must be a positive integer less than 1,000,000, including 0. To turn off a weighting factor, specify 0.
The values of the weighting factors are significant only in relation to each other. As the value of one factor increases relative to the others, it has more impact in choosing a request from the suitable requests. If all weighting factors are 0, the request is selected from a specific queue on a first-in, first-out (FIFO) basis.
For all supported platforms except UNICOS/mk, the NQS request scheduler uses the following algorithm to select a request for initiation from all of the eligible requests in a specific queue:
intraqueue_priority = ((time_in_queue/ max_time_in_queue) * normalized_time_wt ) + ((1.0 - (cpu_time / max_cpu_time) ) * normalized_cpu_wt) + ((1.0 - (memory_sz / max_memory_sz)) * normalized_mem_wt) + ((1.0 - (share_pri / max_share_pri)) * normalized_share_wt) + ((1.0 - (mpp_cpu_time / max_mpp_cpu_time)) * _mpp_cpu_wt) + ((1.0 - (mpp_pe / max_mpp_pe)) * normalized_mpp_pe_wt) + ((user_priority / max_user_priority) * normalized_user_pri_wt) |
For UNICOS/mk systems, the NQS request scheduler uses the following algorithm to select a request for initiation from all of the eligible requests in a specific queue:
intraqueue_priority= ((time_in_queue/ max_time_in_queue) * rel_time_wt) + ((1.0 - (cpu_time / max_cpu_time) ) * rel_cpu_wt) + ((1.0 - (memory_sz / max_memory_sz)) * rel_mem_wt) + ((share_pri / max_share_pri)) * rel_share_wt) + ((1.0 - (mpp_cpu_time / max_mpp_cpu_time)) * rel_mpp_cpu_wt) + ((1.0 - (mpp_pe / max_mpp_pe)) * rel_mpp_pe_wt) + ((user_priority / max_user_priority) * rel_user_pri_wt |
Each of the job-scheduling resources is scaled from 0 to 1 relative to the other queued jobs. The scheduling weighting factors are also normalized. The normalized share_wt, time_wt, cpu_wt, and mem_wt parameters are determined by the qmgr command set sched_factor. They determine the amount of influence of the fair-share usage, time-in-queue, total requested CPU time, and total requested memory values, respectively, on the computation of the intraqueue priority. The algorithm is designed so that, when all parameters are set to the same value, all of the weighting factors will have an equal effect. The share_wt value is valid only for UNICOS and UNICOS/mk systems.
The cpu_requested and the mem_requested values are the per-request CPU and memory limits from the batch request. These values and the time-in-queue value are from the batch request structure. The share_priority value is the fair-share usage value and is valid only for UNICOS and UNICOS/mk systems. On UNICOS systems, the fair-share usage value is obtained by accessing the lnode information through the limits(2) system call if the user is active at that time; otherwise, the UDB is used to obtain the value. On UNICOS/mk systems, the share_priority is obtained from the political scheduling daemon.
On UNICOS/mk or IRIX systems, the number of requested application PEs (mpp_pe) and the application PE CPU time (max_mpp_pe) weighting factors allow you to further tune the NQS job initiation (intraqueue) priority formula.
If a user specifies a priority by using the General Options selection of the Configure menu on the NQE GUI Submit window, the priority is displayed by the cqstatl or qstat command while the request resides in a pipe queue, and it continues to be in the request queue if the request is routed to another machine. The value specified by the user is not used in the calculation of the intraqueue priority; it is still valid on NQE systems because it allows compatibility with public-domain versions of NQS.
For UNICOS systems, the qsub -p priority and cqsub -p priority commands specify the user-requested intraqueue priority for all systems or the Unified Resource Manager (URM) priority increment for UNICOS systems running URM. The priority is an integer between 0 and 63. If you do not use this option, the default is 1.
If you are running URM, the priority is passed to URM during request registration. URM adds this value as an increment value to the priority that it calculates for the request.
To change the intraqueue priority of queued job requests, use the qmgr schedule request commands. Scheduling a request changes the position of a request within a queue. The request must still meet the global, complex, queue, and user limits criteria before it is initiated, except when it is specified in the schedule request now command. Requests in queues that have a higher interqueue priority usually are initiated before requests in lower-priority queues, regardless of whether they have been scheduled.
user_priority is the scheduling weighting factor for user specified priority.
In order to get the previously supported behavior of the qsub -p command user specified priority for job initiation scheduling, you should set the user_priority weighting factor to nonzero and set all of the other weighting factors to 0. Then, only user specified priority will be used in determining a job's intraqueue priority for job initiation. The qmgr modify request command can still be used to modify a job's user priority.
The user specified priority is displayed in request full status displays of the qstat(1), cqstatl(1), and the NQE GUI. The column name displays as follows:
User Priority/URM Priority Increment: 22 |
The qsub -p submission option is used for both user priority and URM priority. If NQS is configured for URM job scheduling, then any -p submission option is handled as an URM minirank priority increment by the nqsdaemon. Otherwise, the -p submission option is handled as a user specified priority, which is used with the NQS job initiation formula when calculating intraqueue priority for the job. Either NQS or URM is making job ordering decisions, but not both at the same time.
Perhaps the most effective way to use intraqueue priority is to set the scheduling factors so a request is kept waiting in relation to how long it will take to run.
For example, if there are two requests with a disparate need for CPU time, they can greatly affect each other. If request big needs 900 seconds of CPU time and request small needs only 10 seconds, letting big run before small is significant for the user who submits small. If big will take 900 seconds to run, the effect on big of letting small run first is about 2% of big's expected run time. However, if big runs before small, the effect on small is about 6000% of small 's expected run time. Thus, it makes more sense to give small the higher priority.
However, it is also significant how long a request has been waiting to run. If big has been set aside in favor of smaller requests for several days, it too suffers a disproportionate waiting time.
For UNICOS and UNICOS/mk systems, you can set a job-scheduling factor that seeks to schedule according to the considerations just described. You can schedule by the requested CPU time and the time the request has been waiting in the queue. The following settings for the set sched_factor commands equally weight the amount of time the request has been waiting and the amount of CPU time requested (the qmgr command set sched_factor share is valid only on UNICOS systems):
set sched_factor share = 0 set sched_factor cpu = 100 set sched_factor memory = 0 set sched_factor time = 100 set sched_factor mpp cpu = 0 set sched_factor mpp pe = 0 set sched_factor user specified priority = 0 |
The value of 100 is not significant; what is significant is that CPU time and time waiting in the queue are equal.
In this scenario, the more CPU time the request needs, the lower its scheduling priority. After it waits longer than its requested CPU time, it has a higher scheduling priority. This stops a large request from being excluded indefinitely by many small requests.
The following example shows the qmgr display from a show parameters command for a machine that has set a strategy that equally weights requested CPU time and time waiting in a queue:
Job Initiation Weighting Factors: Fair Share is not enabled. Fair Share Priority = 0 Requested CPU Time = 1 Requested Memory = 0 Time-in-Queue = 1 Requested MPP CPU Time = 0 Requested MPP PEs = 0 User Specified Priority = 0 |
NQS uses various user, queue, complex and global configurable limits for CRAY T3E application PEs when deciding whether a particular job should be initiated. Since this information is not directly related to the actual availability of PEs on a CRAY T3E system, NQS can initiate jobs which then wait in a Global Resource Manager (GRM) queue, waiting for sufficient application PEs to become available.
You can configure NQS to query the CRAY T3E political scheduling daemon (psched) to receive the current size of the maximum contiguous block of application PEs. A threshold to be used before contacting the psched daemon is configurable within the nqeinfo(5) file. This new variable is NQE_PSCHED_THRESHOLD. The valid values for this variable are as follows:
-1 (default value)
0 through 100
This value represents a percentage amount. The default of -1 disables the checking of the application PE size. This variable is not in the default nqeinfo file. You can use the nqeconfig(8) command to add this variable to the nqeinfo file.
The threshold percentage is applied against the number of application PEs that NQS thinks are available, using NQS requested PE counts of running jobs and the NQS global PE limit. If a job requests that threshold amount or a number of application PEs greater than the threshold amount, then the nsqdaemon(8) calls the psched daemon to get the current maximum contiguous application PE size. If the job's requested application PEs are at or below this size, the job is initiated. Otherwise, the job is not initiated and the job's status display substatus is set to mp. The mp substatus indicates that the job was not initiated because there were not enough application PEs available.
Here is an example given this scenario:
The NQE_PSCHED_THRESHOLD variable is added to the nqeinfo(5) file and is set to 80. |
The NQS global application PE limit is 240. |
Currently running jobs submitted through NQS have requested 200 application PEs. |
NQS assumes that 40 applications PEs are still available. |
The application PE space is fragmented and only 37 application PEs are available in a contiguous block. |
The next job selected to be initiated requests 36 application PEs. The 80 percent threshold is applied against the 40 PEs that NQS thinks are available. The job is asking for application PEs above the threshold of 32 PEs, so the psched daemon is contacted. The returned application PE contiguous job size of 37 allows this job to be initiated.
If the job had requested 38 application PEs, the job would not be initiated. A status display for this job would show a substatus of mp, indicating that insufficient applications PEs were available to run the job.
If the nqsdaemon is unable to get application PE size information from the psched daemon for any reason, the job is not held back but is initiated. Messages are written to the NQS logfile describing the error.
The UNICOS Unified Resource Manager (URM) is a job scheduler that provides a high-level method of controlling the allocation of system resources to run jobs. When you enable URM scheduling within NQS, NQS registers certain jobs with URM. Which jobs are registered depends upon which job-scheduling type the set job scheduling command has specified. The jobs registered with URM are in a qmgr scheduling pool and show, in the qstat display, a major status of Q and a substatus of us.
URM advises NQS when to initiate jobs that have been placed in the scheduling pool. When URM scheduling is in effect, the intraqueue priority has no meaning and is displayed as ---. All qmgr schedule commands, except qmgr schedule now, have no effect when URM is recommending which jobs to initiate. The qmgr schedule now command initiates a job immediately without consulting URM as to whether the current machine load can tolerate the submission.
For more information about URM, see UNICOS Resource Administration, publication SG-2302.
The IRIX 6.5 release introduces a new scheduler called the Miser scheduler. The Miser scheduler (also referred to as Miser) is a predictive scheduler. As a predictive scheduler, Miser evaluates the number of CPUs and amount of memory a batch job will require. Using this information, Miser consults a schedule that it maintains to track the utilization of CPU and memory resources on the system. Miser determines the time when a batch job can be started and have its requirement for CPU and memory usage met. The batch job is then scheduled to begin execution based on this time.
Miser is beneficial because it reserves the resources that a batch job requests. As a result, when the batch job begins to execute, it does not need to compete with other processes for CPU and memory resources. A disadvantage of using Miser is that a job may have to wait for its reservation period before it can begin executing. Additionally, batch jobs must provide the Miser scheduler with a list of resources that they will require. Currently, those resources are the number of CPUs, the amount of memory required, and the length of execution time required.
For a request to be successfully submitted to the Miser scheduler on an NQE execution host, the following criteria must be met:
The job scheduling class must equal Miser Normal. For more information on Miser Normal, see “Setting the Job Scheduling Type”.
The request specifies a Miser resource reservation option. For more information on the Miser resource reservation option, see the NQE User's Guide, publication SG-2148.
The destination batch queue has been directed to forward requests to the Miser scheduler. For more information on directing batch queues to forward requests to the Miser scheduler, see “Directing Batch Queues to Forward Requests to the Miser Scheduler”.
Miser is running on the execution host.
The Miser queue exists and is accessible on the execution host .
The request can be scheduled to begin execution before the scheduling window expires. The schedule for a request will depend upon the resources requests (CPU, memory, and time). For more information, see “Directing Batch Queues to Forward Requests to the Miser Scheduler”.
A new job scheduling type has been created to allow NQS to inter-operate with the Miser scheduler on IRIX systems. The new scheduling type is Miser Normal. The Miser Normal scheduling type is derived from the Nqs Normal scheduling type. Under Miser Normal scheduling, the order of the batch jobs in the NQS queues is determined by the queue ordering algorithms that operate under Nqs Normal scheduling.
The difference between Miser Normal and Nqs Normal scheduling is realized when a job is being analyzed to determine if it should be executed. During this analysis, the request is scanned to determine if Miser resource options have been specified. If Miser options are specified, the NQS batch queue is directed to forward requests to the Miser scheduler and NQS queries the Miser scheduler. If no Miser options are specified, the request is started as a normal NQS request (the request will be processed by the operating system default scheduler).
When querying the Miser scheduler, NQS tries to determine if the request can be started by Miser, given the resources requested, before the defined scheduling window expires. If the request cannot be scheduled, NQS continues to hold the request in the batch queue.
For more information on Miser Normal scheduling and for the syntax to set the job scheduling type, see the qmgr(8) man page.
To configure NQS to operate with the IRIX Miser scheduler you must define the batch queue to indicate that it is capable of forwarding requests to a Miser scheduling queue. You can use the qmgr(8) command to set the batch queue to forward requests to the Miser scheduling queue as follows:
qmgr> set miser [batch queue] [miser queue] [H:M.S|INT[s|m|h] |
The options for the qmgr(8) command are as follows:
batch queue | Name of the NQS batch queue | |
miser queue | Name of the Miser scheduler resource queue | |
H:M.S|INT[s|m|h | Time for the scheduling window defined in hours: minutes.seconds or an integer value with the suffix: s for seconds, m for minutes, and h for hours. |
For the qmgr command to execute successfully, the batch queue must exist, Miser must be running, and the Miser resource queue must exist.
The scheduling window determines how long a request can wait in the Miser scheduler resource queue before it begins executing; that is, before its resource reservation period arrives. This scheduling window can be used to control the job mix that is forwarded to the Miser resource queue. For example, three NQS batch queues might be defined as follows:
One queue is defined for jobs that require 32 or more CPUs.
A second queue may be defined for jobs that require between 8 and 32 CPUs.
A third queue may be defined for jobs that require less than 8 CPUs.
All of these batch queues can be designated to send requests to the same Miser resource queue.
You can define the scheduling windows so that the batch queues that forward requests that require larger amounts of resources are given a larger scheduling window. You can also define scheduling windows so that batch queues that require fewer resources are given smaller scheduling windows. Defining the scheduling windows in this manner results in a configuration where requests that are lightweight are throttled back so that they do not interfere with the scheduling of requests that require more resources.
An example of the set miser command is as follows:
qmgr> set miser batnam1 chemistry 20m |
This example shows that the NQS batch queue batnam1 will forward requests to the Miser resource queue chemistry. The scheduling window specified is 20 minutes. When NQS attempts to execute a request in queue batnam1, it will query the Miser scheduler to determine when the request will be given the resources it has requested using the qsub -X command. If the query indicates that the request will begin executing before the scheduling window expires, the request is submitted to Miser. Otherwise, NQS will continue to queue the request.
To remove the relationship between a NQS batch queue and Miser run queue, subsitute the keyword none for the Miser resource queue name. The value for the scheduling window is irrelevant in this case. The following example shows how you can remove the relationship between the NQS batch queue batnam1 and the Miser resource queue chemistry:
qmgr> set miser batnam1 none 0 |
On UNICOS, UNICOS/mk, and IRIX systems, NQS periodic checkpointing provides transparent automatic checkpointing of NQS requests. The checkpoint files restart NQS requests from the time the checkpoint file was created. To checkpoint the requests at intervals that are site-controlled, use the qmgr set periodic_checkpoint commands. For more information, see the qmgr(8) man page.
Periodic checkpointing initiates a checkpoint for all NQS requests based on the amount of CPU time that is used or the amount of wall-clock time that has elapsed. An NQS administrator can tailor the following periodic checkpoint options:
Enable and disable periodic checkpointing for all NQS requests
Enable and disable periodic checkpoint intervals based on the CPU time
Enable and disable periodic checkpoint intervals based on the wall-clock time
Exclude short running requests
Exclude large memory requests
Exclude large secondary data segments (SDS) requests (UNICOS systems only)
Set an interval for periodic checkpoints based on the CPU and wall-clock time
Set the maximum number of concurrent periodic checkpoints
Define an interval to check eligible requests
Users can enable and disable periodic checkpointing during the life of a job by using the qalter command in jobs. For more information, see the qalter(1) man page.
The periodic checkpoint/restart file cannot be a network file system (NFS) file, but a process with open NFS files can be checkpointed and restarted.
If a process is checkpointed with an open NFS file, and the file is on a file system that was manually mounted, that file system also must be manually mounted when the process is restarted; otherwise, the restart will fail.
You can set your periodic checkpoint environment to one of the following modes:
Compatibility mode (default)
User-initiated mode
CPU time mode
Wall-clock mode
CPU and wall-clock mode
This section describes how to set each periodic checkpoint mode of operation by using the NQS qmgr(8) and qalter(1) commands.
To set your periodic checkpoint environment to compatibility mode, keep the periodic checkpointing turned off. This is the default environment for periodic checkpoint. To disable periodic checkpointing, use the following command:
set periodic_checkpoint status off |
If the qalter(1) command is used in a request, the settings are retained while periodic checkpointing is disabled and used if periodic checkpointing is enabled later. The jobs are checkpointed at NQS shutdown.
To set your periodic checkpoint environment to user-initiated mode, you must enable periodic checkpointing with the CPU and wall-clock intervals disabled. This mode periodically checkpoints a request that contains qalter(1) commands to enable checkpointing for the request. The following example shows how you can use periodic checkpoint commands to set the user-initiated mode:
set periodic_checkpoint status on set periodic_checkpoint time_status off set periodic_checkpoint cpu_status off set periodic_checkpoint cpu_interval = 15 set periodic_checkpoint time_interval = 90 set periodic_checkpoint max_mem_limit = 64MW set periodic_checkpoint max_sds_limit = 48MW set periodic_checkpoint min_cpu_limit = 600 set periodic_checkpoint scan_interval = 60 set periodic_checkpoint concurrent_checkpoints = 1 |
In this example, individual requests can be periodically checkpointed. The NQS system status is enabled, but the wall-clock and CPU time status are disabled. NQS lets requests that were modified by the qalter(1) command be enabled for periodic checkpointing. If a qalter command enables the CPU-based periodic checkpointing but does not define an interval, the request receives the default of 15 minutes. If a qalter command enables the wall-clock based periodic checkpointing but does not define an interval, the request receives the system default value of 90 minutes.
If a request has a limit of more than 64 Mwords of memory, more than 48 Mwords of SDS space, or has a CPU time limit of less than 600 seconds, it is excluded from periodic checkpointing. NQS scans requests every 60 seconds to determine whether a running request is eligible for periodic checkpointing. Only one periodic checkpoint request can be run at a time. The jobs are checkpointed at NQS shutdown.
To set your periodic checkpoint environment to CPU time mode, you must enable periodic checkpointing to create the checkpoint files after a specific amount of CPU time has been used by the request.
On IRIX systems, job limit checking must be enable per CPU time mode periodic checkpoint.
The following example shows how you can use the periodic checkpoint commands to set the CPU time mode:
set periodic_checkpoint status on set periodic_checkpoint time_status off set periodic_checkpoint cpu_status on set periodic_checkpoint cpu_interval = 30 set periodic_checkpoint time_interval = 60 set periodic_checkpoint max_mem_limit = 32MW set periodic_checkpoint max_sds_limit = 32MW set periodic_checkpoint min_cpu_limit = 900 set periodic_checkpoint scan_interval = 60 set periodic_checkpoint concurrent_checkpoints = 1 |
In this example, NQS system-wide periodic checkpointing is enabled, and CPU-time-based checkpointing is enabled. The wall-clock-time checkpointing is disabled. NQS checkpoints requests every 30 CPU minutes.
If a request has a limit of more than 32 Mwords of memory or more than 32 Mwords of SDS space, or has a CPU time limit of less than 900 seconds, it is excluded from periodic checkpointing. NQS scans requests every 60 seconds to determine whether a running request is eligible for periodic checkpointing. No more than one periodic checkpoint request can be running. The jobs are checkpointed at NQS shutdown.
To set your periodic checkpoint environment to wall-clock mode, you must enable periodic checkpointing to create the checkpoint files after a specified amount of wall-clock time has elapsed for a request.
The following example shows how you can use the periodic checkpointing commands to set the wall-clock mode:
set periodic_checkpoint status on set periodic_checkpoint time_status on set periodic_checkpoint cpu_status off set periodic_checkpoint cpu_interval = 30 set periodic_checkpoint time_interval = 60 set periodic_checkpoint max_mem_limit = 64MW set periodic_checkpoint max_sds_limit = 64MW set periodic_checkpoint min_cpu_limit = 900 set periodic_checkpoint scan_interval = 120 set periodic_checkpoint concurrent_checkpoints = 2 |
In this example, NQS system-wide periodic checkpointing is enabled, and checkpointing based on wall-clock time is enabled. The CPU-time-based checkpointing is disabled. NQS checkpoints requests every 60 wall-clock minutes.
If a request has a limit of more than 64 Mwords of memory or more than 64 Mwords of SDS space, or has a CPU time limit of less than 900 seconds, it is excluded from periodic checkpointing. NQS scans requests every 120 seconds to determine whether a running request is eligible for periodic checkpointing. Two periodic checkpoint requests can run simultaneously. The jobs are checkpointed at NQS shutdown.
To set your periodic checkpoint environment to CPU and wall-clock mode, you must enable periodic checkpointing to create the checkpoint files after a specified amount of CPU or wall-clock time has elapsed for a request, whichever occurs first. The following example shows how you can use the periodic checkpoint commands to set the CPU and wall-clock mode:
set periodic_checkpoint status on set periodic_checkpoint time_status on set periodic_checkpoint cpu_status on set periodic_checkpoint cpu_interval = 20 set periodic_checkpoint time_interval = 45 set periodic_checkpoint max_mem_limit = 64MW set periodic_checkpoint max_sds_limit = 64MW set periodic_checkpoint min_cpu_limit = 1200 set periodic_checkpoint scan_interval = 90 set periodic_checkpoint concurrent_checkpoints = 3 |
In this example, NQS system-wide periodic checkpointing, CPU-time-based checkpointing, and wall-clock-time-based checkpointing are all enabled. NQS checkpoints requests every 45 wall-clock minutes or every 20 CPU minutes, whichever occurs first.
If a request has a limit of more than 64 Mwords of memory or more than 64 Mwords of SDS space, or has a CPU time limit of less than 1200 seconds, it is excluded from periodic checkpointing. NQS scans requests every 90 seconds to determine whether a running request is eligible for periodic checkpointing. Three periodic checkpoint requests can run simultaneously. The jobs are checkpointed at NQS shutdown.
The following sections describe how periodic checkpointing affects NQS events.
Periodic checkpoints are not initiated during a system shutdown. The checkpoints that are in progress continue to run until they are completed.
No checkpointing occurs when a request is terminating. However, a checkpoint event can be initiated and a request can terminate during the grace period. In this case, NQS request completion cleans up the checkpoint end. If this request will be requeued, the previous valid checkpoint file is retained.
A qchkpnt(1) command is similar to a periodic checkpoint event. If another checkpoint event is in progress, a qchkpnt command is rejected. A qchkpnt event is counted toward a periodic checkpoint event, and the timers are reset for the next periodic event after a qchkpnt completes.
A periodic checkpoint that is in progress during an abort event is terminated, and the checkpoint file is deleted.
If you modify a request, its CPU or memory limit can change. This change can make the request ineligible for periodic checkpointing if its CPU time is set to a value less than the value of the set periodic_checkpoint min_cpu_limit command, its memory limit is set to a value greater than the value of the set periodic_checkpoint max_mem_limit command, or its SDS request limit is set to a value greater than the value of the set periodic_checkpoint max_sds_limit command.
If a request is being held, it will be checkpointed, terminated, and requeued. If another checkpoint event is in progress, the requests that are being held are rejected. After a request has been held, it is no longer running or eligible for periodic checkpointing.
If a request is being preempted or has completed preemption, no checkpoint events for that request can occur. If a request is currently in a preempted state or in the process of being preempted, no periodic checkpointing is tried.
If a request is released, it requeues a previously held request and makes it eligible for initiation. The request is started from the last checkpoint file that was created. Any changes that are related to periodic checkpointing are retained, and those values are used when the request is spawned.
If a request is rerun, it is terminated and requeued, and any checkpoint files are deleted. The request is rerun as though it were a new request. Periodic checkpointing is not affected. If the qalter(1) command changed any periodic checkpoint options, they revert back to the system-wide default values that qmgr(8) defined.
If a request is restored, it resumes a request that was previously preempted and makes it eligible for execution. If changes related to periodic checkpointing were made while the request was preempted, they are retained, and those values are used when the request begins to execute.
If a request is resumed, it resumes a request that was previously suspended and makes it eligible for execution. If changes related to periodic checkpointing were made while the request was suspended, they are retained, and those values are used when the request begins to execute.
NQS maintains a log file in which time-stamped messages of NQS activity are recorded. These messages are in text format. To define the file specification of the log file, use the following command:
set log_file name |
The name specified may be either an absolute path name or a path name relative to the spool directory. If you specify the same log file name as the existing log file, new information is appended to that file.
To truncate the current log file, use the following qmgr command:
reset logfile |
The log file is maintained throughout every shutdown and recovery sequence; new messages are appended to the end of the file. If the log file name is changed, the old file is preserved and is then available for further processing.
NQS can be configured to copy the existing log file automatically to a new file and then reset the existing log file to its beginning. This process is called segmenting the log file.
To specify the directory where the segmented log file is kept, use the following command:
set segment directory dirname |
The dirname is either the absolute specification of the directory in which segmented files are to be placed, or it is relative to the spool directory (as indicated by the value of NQE_NQS_SPOOL in the nqeinfo file). Segmentation will not take place if directory is not set or is set to none. You should not specify /dev/null as the log file name. The segmented NQS log file names reflect the date and time of the earliest information contained in the segment.
To set the log file segmentation at NQS start-up, use the following qmgr commands:
set segment on_init on set segment on_init off |
To segment the log file when it reaches a certain size or age, use the following qmgr commands:
set segment size = bytes set segment time_interval = minutes |
To control the detail level of the information recorded in the log file, use the following qmgr commands:
set message_types on event_type(s) set message_types off event_type(s) |
Warning, fatal, and unrecoverable messages are always recorded in the log file. You can turn off informational, efficiency, caution, and trace messages for specific types of events. The following is a list of the event types; the brackets indicate the portion of the command that does not need to be typed when it is entered:
al[l] (except flow) | db_r[eads] | op[er] | rec[overy] |
ac[ccounting] | db_w[rites] | ou[tput] | req[uest] |
ch[eckpoint] | f[low] | packet_c[ontents] | rou[uting] |
com[mand_flow] | network_m[isc] | packet_f[low] | s[cheduling] |
con[fig] | network_r[eads] | proto_c[ontents] | user1 |
db_m[isc] | network_w[rites] | proto_f[low] | user2 |
|
|
| user3 |
|
|
| user4 |
|
|
| user5 |
![]() | Note: Use the set message_types on flow command only for a short time interval to help with problem analysis because it can cause severe performance degradation. |
To control the amount of information in the header of each log message, use the following qmgr commands:
set message_header long set message_header short |
Long message headers can help with quicker problem resolution.
To control the level of detail of information in the log file, set the debug level through the qmgr command set debug. For more information, see “Setting the Debug Level”.
To display information about the current log file, use the following qmgr command:
show parameters |
The settings are shown next to the Log_file, MESSAGE_Header, MESSAGE_Types, and various SEgment display entries. See “Displaying System Parameters” in Chapter 6, for an example display.
To set the log file name, the message header information level, the recorded message types, and the segmentation interval for the log file (the file is relative to the NQS spool directory), use the following qmgr commands:
set log_file active.log set message_header long set message_types on all set segment on_init on set segment directory = nqs.log set segment time_interval = 1440 |
Segmented log files are placed in the nqs.log directory under the NQS spool directory, which is set as described in “Setting the Log File”.
To control the level of information recorded in the log file, use the following qmgr command to set the debug level for NQS:
set debug level |
Currently, two sets of debug information are supported:
Level | Meaning |
0 | No debug information is recorded. |
1-3 | When you specify each higher number (level), the amount of debugging information that is recorded increases, respectively. |
To avoid excessive disk usage, set a nonzero debug level only when extra information is required to analyze a problem. The default on an installed NQS system is 0.
To display the current debug level, use the following qmgr command:
show parameters |
The setting is shown next to the words Debug level on the display. See “Displaying System Parameters” in Chapter 6, for an example display.
The accounting information that NQS produces is part of daemon accounting. This information deals with job initiation, termination, and conditions, such as rerun, restart, and preemption. The set accounting off and set accounting on commands in the qmgr(8) subsystem let you specify whether this information will be produced.
The accounting file location is $NQE_SPOOL/spool/private/root/nqsacct or $NQE_NQS_SPOOL/private/root/nqsacct, where $NQE_SPOOL is the spool location defined at installation time. Both variables appear in the nqeinfo file.
The qmgr commands set accounting off and set accounting on are used to turn NQS daemon accounting off or on.
On UNICOS and UNICOS/mk systems, NQS daemon accounting information is set using the standard UNICOS and UNICOS/mk accounting system. For detailed information about UNICOS accounting, see UNICOS System Administration, publication SG-2113 or UNICOS/mk Resource Administration, publication SG-2602.
On UNIX systems, NQS simply writes accounting records to the file private/root/nqsacct (as indicated by the value of NQE_NQS_SPOOL in the nqeinfo file) under the NQS spool directory. The file format is ASCII and has the following fields: User ID, Job ID, Sequence Number, Time, Type, Subtype, Host, Request name, and Queue. For job termination records, the user, system, child user, and child system time for NQS shepherd processes are also recorded. The same information can be logged to the NQS log file by using the qmgr command set message_type on accounting.
The following qmgr command is used to turn NQS daemon accounting off:
set accounting off |
The current state of NQS daemon accounting can be displayed by using the show parameters command.
The following qmgr command is used to turn NQS daemon accounting on:
set accounting on |
![]() | Note: This is with respect to NQS. For UNICOS and UNICOS/mk systems, it is up to the UNICOS or UNICOS/mk accounting system to actually enable NQS daemon accounting. |
The current state of NQS daemon accounting can be displayed by using the qmgr command show parameters.
Accounting records are written to the NQS log and/or the NQS accounting file. Table 5-5 describes the record types and subtypes. The formats of the Type 1 and Type 2 records that are written follow the table.
Type | Subtype | Format | Description | |
---|---|---|---|---|
NQ_INFO | NQ_INFO_ACCTON | Type 1 | Written when accounting is enabled | |
NQ_INFO | NQ_INFO_ACCTOFF | Type 1 | Written when accounting is disabled | |
NQ_RECV | NQ_RECV_NEW | Type 1 | Written when new request is received | |
NQ_RECV | NQ_RECV_LOCAL | Type 1 | Written when new request is received from local pipe queue | |
NQ_RECV | NQ_RECV_REMOTE | Type 1 | Written when new request is received from remote pipe queue | |
NQ_INIT | NQ_INIT_START | Type 1 | Written when request is started on NQS execution server | |
NQ_INIT | NQ_INIT_RESTART | Type 1 | Written when request is restarted on NQS execution server | |
NQ_INIT | NQ_INIT_RERUN | Type 1 | Written when request is rerun on NQS execution server | |
NQ_TERM | NQ_TERM_EXIT | Type 2 | Written when request terminates normally | |
NQ_TERM | NQ_TERM_REQUEUE | Type 2 | Written when request terminates normally and will be requeued | |
NQ_TERM | NQ_TERM_PREEMPT | Type 2 | Written when request terminates normally because it was preempted | |
NQ_TERM | NQ_TERM_HOLD | Type 2 | Written when request terminates normally because it was held | |
NQ_TERM | NQ_TERM_OPRERUN | Type 2 | Written when request terminates normally because the operator reran the request | |
NQ_TERM | NQ_TERM_RERUN | Type 2 | Written when request terminates normally and will be rerun | |
NQ_DISP | NQ_DISP_NORM | Type 1 | Written when request is disposed normally | |
NQ_DISP | NQ_DISP_BOOT | Type 1 | Written when request cannot be requeued at boot time | |
NQ_SENT | NQ_SENT_INIT | Type 2 | Written when request starts being sent to NQS | |
NQ_SENT | NQ_SENT_TERM | Type 2 | Written when request completes being sent to NQS | |
NQ_SPOOL | NQ_SPOOL_INIT | Type 1 | Written when request output starts being transferred | |
NQ_SPOOL | NQ_SPOOL_TERM | Type 1 | Written when request output completes being transferred |
NQE writes two formats of accounting records; the formats are described below. In both formats, all records are ASCII text and fields are separated by spaces.
The Type 1 record format is as follows:
Format | Description | ||
uid | UNIX user ID number for this request | ||
jobid | Process ID of the NQE shepherd for this request | ||
sequence_number | NQE request sequence number | ||
time_of_day | Time of day the record is written; such as,
| ||
type | Record type | ||
subtype | Record subtype | ||
host_name | Host name on which the record was written | ||
request_name | NQE request name | ||
queue_name | NQE queue name | ||
utime | User time of NQE shepherd for this job | ||
stime | System time of NQE shepherd for this job | ||
cutime | Sum of user CPU time for all processes in this job | ||
cstime | Sum of system CPU time for all processes in job | ||
project_id | Project ID for all processes in this job (IRIX systems only) |
![]() | Note: Total job time = utime + stime + cutime + cstime
Total overhead = utime + stime Total user time = cutime + cstime |
The Type 2 record format is as follows:
Format | Description | ||
uid | UNIX user ID number for this request | ||
jobid | Process ID of the NQE shepherd for this request | ||
sequence_number | NQE request sequence number | ||
time_of_day | Time of day the record is written; such as,
| ||
type | Record type | ||
subtype | Record subtype | ||
host_name | Host name on which the record was written | ||
request_name | NQE request name | ||
queue_name | NQE queue name | ||
project_id | Project ID for all processes in this job (IRIX systems only) |
The units used for utime, stime, cutime, and cstime vary by machine and operating system level. The unit is measured in ticks per second. The unit is defined on each system by the value of sysconf (_SC_CLK_TCK), which can vary by machine and may be run-time alterable.
To provide security against unauthorized remote access of the local NQS system, you can specify that you want to validate incoming client requests. The type of validation you select is performed for all actions: submitting requests, monitoring requests and queues, and deleting or signaling requests.
The administrator of each NQS server in the group of NQE nodes in the NQE cluster can set a validation type. To avoid confusing users, all NQS servers in the group of NQE nodes in the NQE cluster should use the same validation type. The default validation type for an NQS server when first installed is validation file checking.
You can specify the following validation types for the NQS host:
Validation type | Description | |||
No validation | No validation is performed by NQS. If the name of the user under which a command is issued (the target user) is a valid user at the NQS server, the command is successful. | |||
Validation files | A validation file is examined to determine whether there is an entry that allows the user under which the command is issued to access the server. The validation file can be either an .rhosts or .nqshosts file in the user's home directory at the NQS server. (If you do not use .rhosts files at your site, you can use .nqshosts). Individual users create these files.
| |||
Password checking | The NQS user is required to supply a password for the user name at the NQS server at which the request is to be run. A password may be no longer than 8 characters. The password is checked only at the NQS server. The password is used for validating the user name under which a user command will be executed (either through the NQE GUI functions or through the following commands: cqsub, cqstatl, cqdel, qsub, qstat, and qdel). The password is encrypted and passed over the network, and NQS checks the password file to ensure that the password is valid. | |||
Password checking and validation files | In this combined method, a user may supply a password when issuing a command as is done in password checking. If a password is supplied, the local NQS server validates it; if the password is incorrect, NQS rejects the action the client command requested. If a password is not supplied, validation file checking is performed. |
![]() | Note: For client commands, the user is prompted for a password if the -P option is specified on the command line or if the NQS_PASSWORD_NEEDED environment variable is set. |
To set the validation type, use the following qmgr commands:
set validation [password] [file] |
The default validation is by file. You can specify one or both of the parameters, as follows:
Parameter | Definition | |
password | Password checking only. | |
file | Validation file checking only. | |
password file | Password checking if a password is supplied; otherwise, validation file checking is performed. |
To turn off validation, use the following command:
set no_validation |
The following field in the qmgr command show parameters display indicates the validation policy is by file:
Validation type = validation files |
See “Displaying System Parameters” in Chapter 6, for a full example display.
This section describes miscellaneous NQS configuration settings.
If the NQS system cannot send a request from a local pipe queue to its destinations (for example, because the network connection to the system supporting the destinations has failed), it will try resending the request at regular intervals. To define how often the server will retry sending a request, use two qmgr commands, as follows:
To define the interval (in minutes) between successive attempts to send the request, use the following qmgr command:
set default destination_retry wait interval |
The initial value for a newly installed NQS system is 5 minutes.
To avoid continuous retries, use the following qmgr command to define the maximum elapsed time (in hours) for which a request can be retried. If this time is exceeded, the request submission fails and a mail message is sent to the user who submitted the request.
set default destination_retry time time |
The initial value for a newly installed NQS system is 72 hours.
The following fields in the qmgr command show parameters display indicate the values for retry limits:
Default destination_retry time = 72 hours Default destination_retry wait = 5 minutes |
See “Displaying System Parameters” in Chapter 6, for a full example display.
In this example, you want to retry requests every 10 minutes. However, you want to set a limit of 48 hours on the time during which the retries can occur; after this time, NQS will abandon sending the request and inform the user that the request failed. To do this, you must enter the following qmgr commands:
set default destination_retry wait 10 set default destination_retry time 48 |
NQS can send different types of mail messages to users. For example, a message can be sent when the request completes execution if the qsub -me option was used in the request submission. Messages are also sent when errors occur. The initial setting uses the user name root in the From field of the message.
To change the name in the From field to another name, use the following qmgr command:
set mail name |
The name argument must be the name of a valid user at the system running the server.
The following field in the show parameters display indicates that root is the sender of NQS mail:
Mail account = root |
See “Displaying System Parameters” in Chapter 6, for a full example display.
If you have a special user account as the NQS manager, you may want users to identify mail messages from that user account rather than just indicating that they are from root. For example, if there is a user called nqsman, enter the following qmgr command:
set mail nqsman |
Mail then seems to come from a user called nqsman.
The UNIX system swaps processes in and out of memory as required. Therefore, more processes can be run simultaneously than when all the processes are in memory all of the time. Although this does slightly increase the overhead on the system performance to manage the swapping, it is not usually useful to lock a process into memory.
By default, the NQS daemon nqsdaemon is not locked into memory (it can be swapped in and out as required). If you want to lock it into memory, use the following qmgr command:
lock local_daemon |
![]() | Note: The ability to lock nqsdaemon into memory is supported only for systems running a derivative of UNIX System V. |
To unlock the process from memory, use the following qmgr command:
unlock local_daemon |
The following field in the show parameters display indicates that the NQS daemon is not currently locked in memory:
NQS daemon is not locked into memory. |
See “Displaying System Parameters” in Chapter 6, for a full example display.
If you are defining many systems or queues, perform the configuration manually or create a script file of qmgr commands to use as input to qmgr instead of entering the information at the qmgr prompt.
After you complete an initial configuration, use the qmgr snap command to generate a script file automatically. This file contains the qmgr commands that will generate the current NQS configuration; see “Creating a qmgr Script File of the Current Configuration” for more information.
To edit the script file, use any standard text editor. The qmgr commands in the file are the same as those you would enter at the qmgr prompt.
To pipe the file into the qmgr utility, use the following command line:
cat filename | qmgr |
Each line in the file is executed as if it were entered at the qmgr prompt. Messages are displayed by qmgr as usual in response to the commands. If a command produces an error, the error message is displayed, and execution continues to the next line of the file.
To record the output that qmgr produces (the size of the output display usually is larger than the size of your screen display), use the following command to redirect the standard output to a file:
cat filename | qmgr > output-filename |
The qmgr command snap creates a file containing the qmgr commands that are necessary to redefine the current NQS configuration. This file can then be input to qmgr at another NQS system to build a duplicate copy of the configuration there.
The format of the snap command is as follows:
snap [file = pathname] |
The pathname argument is the path name of the file that will receive the qmgr commands. If you omit this argument, the qmgr commands are written to the file specified by the qmgr command set snapfile. If no file was specified, the snap command fails.
The format of the set snapfile command is as follows:
set snapfile pathname |
The pathname is relative to the NQS spool directory, or it can be an absolute path name. The current file (if any) is shown on the display produced by the qmgr command show parameters. The name appears after the characters snap_file. The initial value of the file name is /dev/null. See “Displaying System Parameters” in Chapter 6, for an example display.
To input a snap file to qmgr, use the cat(1) command, as follows:
cat pathname | qmgr |
The pathname is an absolute path name or relative to the current directory.
In the following example, the default file that snap uses is first set to nqssnap in the NQS spool directory, and then snap is used to write the file. The more(1) command displays the first few lines of the snap file to show what is written to the file.
snow% qmgr Qmgr: set snapfile nqssnap NQS manager[TCML_COMPLETE ]: Transaction complete at local host. Qmgr: snap Qmgr: quit snow% more nqssnap # #Create the nqs network information concerning Mid, Service, #and Aliases. # add mid rain rain-hy rain-ec add output_agent fta rain add mid squall squall-hy squall-ec add output_agent fta squall --More-- |
The script file produced by snap contains comments that divide the commands into functional groups.
The next screen shows how this script file can be used as input to qmgr at another NQS system:
$ cat /nqebase/database/spool/snapfile | qmgr > outfile |
The NQS server returns output from requests to the client host or to another host (depending on the location that the user specified in the request's output options) by either writing output to a common file space or using fta or rcp. The default is fta using ftp protocol. Return output to the DFS system is restricted to writing output to a common file space or using fta; rcp does not support DFS.
Output option | Output location | |
No location specified or filename specified | The user's working directory at submission on the client workstation. | |
host and filename specified | Specified server and file name. The directory is the user's home directory on host, unless otherwise specified. | |
user, host and/or domain, and pathname specified | Specified user, host, and path name. The pathname can be a simple file name or an absolute path. If the pathname is a simple file name, the file is placed in the user's home directory on the specified host. The domain specifies an FTA domain name that uses network peer-to-peer authorization (NPPA) rather than a password. If users do not want to use NPPA, they can provide a password. |
NQS tries to return output to the correct user and directory according to what the user specified. The NQS server uses the following methods to determine how output will be returned to the client (listed in order of precedence):
![]() | Note: When using the qsub(1) command, numbers 2 and 3 do not apply, and the order of precedence is dependent upon the output agents defined. |
If the output destination host is the local NQS server (NQS_SERVER), output is placed locally in the directory from which the request was submitted or in a specified local location.
If the output destination host is not the local NQS server, NQS uses FTA. NQS always tries to locate a password for the FTA transfer, which may be embedded in the cqsub -o command line or in the user's .netrc file, or it may be specified when using the NQE GUI to submit a request. FTA transfers the file according to the rules configured for the domain specified. If the user omits domain, FTA uses its default domain, which is inet unless you have reconfigured FTA. The inet domain uses ftp protocol and requires a password unless you have configured NPPA. If you have configured NPPA, no password is required.
If the FTA transfer fails, NQS tries to send the output using rcp. rcp uses the .rhosts file on the destination host.
If all of these methods fail, NQS places the output in the home directory of the user on the execution host.
This section explains the differences for NQS when used on UNICOS or UNICOS/mk systems that run the multilevel security feature (MLS). It is assumed that you are familiar with the UNICOS or UNICOS/mk documentation about general system administration and the multilevel security feature on the UNICOS or UNICOS/mk system.
![]() | Note: If you are running NQE 3.3 on a UNICOS 10.0 system or UNICOS/mk 2.0.2, this section applies to your system since the MLS features are always available at these release
levels.
If you are running UNICOS 9.0, this section applies to your system if you are running the MLS feature. |
When NQS runs on a UNICOS or UNICOS/mk system that runs the multilevel security feature, the following differences exist:
![]() | Note: If the MLS feature on UNICOS or security enhancements on UNICOS/mk are enabled on your system, the job output files are labeled with the job execution label. For jobs that are submitted locally, the return of the job output files may fail if the job submission directory label does not match the job execution label. For example, if a job is submitted from a level
0 directory, and the job is executed at a requested level 2, the job output files cannot be written to the level 0 directory. If the home directory of the UNICOS user under whom the job ran is not a level 2 directory, does not have a wildcard label, or is not a multilevel directory, the job output
files cannot be returned to that directory either. The job output files will be stored in the NQS failed directory.
If the MLS feature on UNICOS or security enhancements on UNICOS/mk are enabled on your system and you submitted a job remotely, the Internet Protocol Security Options (IPSO) type and label range that are defined in the network access list (NAL) entry for the remote host affect the job output file return. |
When job output is returned, the output files are labeled at the job execution label for both local and remote destinations.
When a job is restarted, the session security attributes are reverified; if these attributes are not in a valid range, based on the user's user database (UDB) entry and the socket (or system) security attributes when the job was queued, the job is deleted. If the job is deleted, a message is written to the syslog file.
A label requested by using the -L and -C options of the qsub or cqsub command or the NQE GUI must dominate the job submission label.
Mail is sent at the appropriate job label. Mail is not sent at the nqsdaemon label, but at either the job submission or job execution label, as applicable. If the job has not yet been initiated, mail sent to the job owner is sent at the job submission label. If the job has been initiated or has completed execution, mail is sent at the job execution label. To read this mail, users must have their current active label set appropriately.
Session security attributes are constrained by the socket security attributes when the job was queued and the user's UDB entry. For local submissions, the system minimum and maximum label range is used. For more information on the socket security attributes, see your UNICOS or UNICOS/mk system administration documentation.
The workstation access list (WAL) is checked for remote job activity, such as job acceptance, job status, and deleting jobs. If the execution host has a WAL entry for the origin host that restricts access to NQS services for the caller, remote job activity to a remote host can fail. For more information on the WAL, see your UNICOS or UNICOS/mk system administration documentation.
For sites running the MLS SECURE_MAC feature, all NQS user commands are labeled as trusted processes so that they can write to a syshigh labeled log file. The NQS user commands write messages to the NQS log file.
On an upgrade installation from a system not running the multilevel security feature to a system that is running the multilevel security feature or an upgrade installation from UNICOS 9.0 to UNICOS 10.0, the jobs are held and are not initiated at NQS startup. The NQS administrator can then either release the job (which will then run at the job owner's UDB default label) or delete the job.
For information on implementing the multilevel security feature, see your UNICOS or UNICOS/mk system administration documentation.
If you are upgrading your NQS configuration to use multilevel directories, you must convert your wildcard directories to multilevel directories (MLDs). To convert between existing wildcard-labeled directories and MLDs, use the cvtmldir command. (For more information, see the cvtmldir(8) man page.) This conversion must be performed in single-user mode after installing NQE.
The cvtmldir command requires absolute path names; you must rename the affected spool directories before calling cvtmldir to convert from wildcard directories to multilevel directories. In the following examples, the NQS_SPOOL/private/root/control directory is being converted, but you also must convert the following NQS directories, the locations of which are defined in your nqeinfo file:
NQS_SPOOL/private/requests
NQS_SPOOL/private/root/chkpnt
NQS_SPOOL/private/root/control
NQS_SPOOL/private/root/data
NQS_SPOOL/private/root/failed
NQS_SPOOL/private/root/interproc
NQS_SPOOL/private/root/output
NQS_SPOOL/private/root/reconnect
To convert from wildcard directories to MLDs, use the following procedure (you must be root):
Ensure that the nqsdaemon is not running while the conversion is occurring. For more information on how to shut down NQS, see the qstop(8) man page. For more information on nqsdaemon, see the nqsdaemon(8) man page.
Activate the secadm category by entering the setucat secadm command if needed.
Change to the parent directory; in this example, to convert the NQS_SPOOL/private/root/control directory, change to the NQS_SPOOL/private/root directory.
Rename the control directory to control.temp by entering the following command:
/bin/mv ./control ./control.temp |
Enter the following command to create a multilevel symbolic link called NQS_SPOOL/private/root/control, which points to a directory named NQS_SPOOL/private/root/control.mld. The files in the control.temp directory are linked or copied, when necessary, into the new control.mld directory. The files are not deleted from the control.temp directory.
/etc/cvtmldir -m $NQS_SPOOL/private/ root/control.temp $NQS_SPOOL/private/root/control |
Enter the following command to create a relative path name for the multilevel symbolic link output, allowing access to the files and directories in NQS_SPOOL/private/root while running with the caller's privileges:
/etc/unlink NQS_SPOOL/private/root/control /bin/ln -m ./control.mld ./control |
You should ensure that the directories were converted successfully. The control.temp directory and its files should not be deleted until NQS has been restarted and the jobs are successfully requeued or restarted. Repeat the preceding steps for each NQS directory that is listed earlier in this section.
![]() | Note: PALs are required for UNICOS 10.0 and UNICOS/mk systems but not for UNICOS 9.0 systems. |
You must assign the privilege assignment lists (PALs) by running the privcmd(8) command. For more information on PALs and the multilevel security feature, see your UNICOS or UNICOS/mk system administration documentation. Multilevel security on a UNICOS or UNICOS/mk system provides a default set of PALs.
To install PALs on each UNICOS or UNICOS/mk multilevel security server, execute the following command:
/etc/privcmd -b /nqebase/etc/nqs.db |
This section describes the following multilevel security policies that are available on a UNICOS or UNICOS/mk system, which are supported on NQS.
![]() | Note: To determine if these policies are supported on your system, see your UNICOS or UNICOS/mk system administration documentation. |
Mandatory access controls (MACs)
Discretionary access controls (DACs)
Identification and authentication (I&A)
System management
Auditing
Mandatory access controls (MACs) are rules that control access based directly on a comparison of the subject's clearance and the object's classification.
The security policy controls read and write operations to prohibit unauthorized disclosure of any system or user information. The security policy is defined as the set of rules and practices by which a system regulates the disclosure of information.
NQS provides enforcement of the MAC rules for status, signal, and delete requests (local and remote). To audit the delete requests, use the SLG_NQS audit record for both successful and failed requests. To audit the MAC override for NQS managers and operators, use the SLG_TRUST (trusted process) audit record. For more information on auditing, see the description of the multilevel secuity feature in your UNICOS or UNICOS/mk system administration documentation.
The MAC rules for status requests require that the caller's active security label dominate the submission label of a job before information about that job can be displayed (that is, the caller's label must be greater than or equal to the job label).
The MAC rules for signal and delete requests require that the caller's active security label equal the label of the job. If the job is queued, the caller's active label must equal that of the job submission label. If the job is executing, the caller's active label must equal that of the job execution label.
If you want to enforce the MAC rules, set the NQE_NQS_MAC_COMMAND configuration parameter to 1 in the nqeinfo file before NQS is started; otherwise, MAC checking is not performed. The setting of this parameter does not affect the NQS managers and operators, who can automatically bypass the MAC rules and administer NQS without additional restrictions.
If MAC rules enforcement is enabled, users from a system that is not running the multilevel security feature can check the status of and delete their jobs only if the jobs have a level 0, no compartments label (UNICOS 9.0 systems only). Depending on the network access list (NAL) configuration on the execution host, these users may not be allowed to submit jobs that request a higher label. For more information on the NAL configuration, see your UNICOS or UNICOS/mk system administration documentation and the spnet(8) man page.
NQS lets you specify a job execution label by using the NQE GUI or the -C and -L options of the qsub and cqsub commands. Descriptions of the execution labels follow, assuming that the user's UDB security attributes allow the specified execution label:
If no -C or -L command-line options are specified, or if the job execution label is not specified using the NQE GUI:
For a local job, the execution label equals the job submission label.
For a remote job, the execution label equals the user's default label if it is within the session bounds (based on the socket minimum and maximum values and the user's UDB minimum and maximum values); if it is not within the session bounds, the execution label equals the session minimum label.
If the -C command-line option is specified, or if the job execution label is not specified using the NQE GUI:
For a local job, the execution label equals the job submission level and the requested compartment set.
For a remote job, the execution label equals the socket active level and the requested compartment set.
If the -L command-line option is specified, or if the job execution label is specified using the NQE GUI:
For a local job, the execution label equals the requested level and the submission compartment set.
For a remote job, the execution label equals the requested level and the socket active compartment set.
If both the -C and -L command-line options are specified, or if the job execution label is specified using the NQE GUI:
For a local job, the execution label equals the requested level and the requested compartment set.
For a remote job, the execution label equals the requested level and the requested compartment set.
A requested label must dominate the job submission label. For example, if a user has an active label of level 2 and compartments sysadm and secadm, and submits a job by using the qsub -L 4 -C sysadm command, this requested label does not dominate the submission label. The requested compartment set must be a superset of the submission compartment set. If the user's UDB entry allows authorized compartments of secadm, sysadm, and sysops, the qsub -L 4 -C sysadm,secadm command and the qsub -L 4 -C sysadm,secadm,sysops command would dominate.
For a remote job, the socket security information is stored during job acceptance in the tail of the control file, as the new record type s, and also within the rawreq extension structure. If the receiving host is configured for strict B1 compliance, the stored socket minimum level and maximum level are set to the socket active level, and the stored socket valid compartment set is set to the socket active compartment set. This constrains the session security minimum and maximum to the socket active label, and, therefore, also contains the job execution label to the socket active label.
If a remote job is queued when the system is not enabled for multilevel security and the system is then configured to be enabled for multilevel security, the s record values are set based on the system security values, and the submission label is set to the user's default label.
If you have installed PALs on your system, any user with an active secadm, sysadm, sysops, or system category can start nqsdaemon; this procedure is audited. The qst PAL privilege text identifies which categories of users can start nqsdaemon. The shutdown procedure is not changed; the caller must be an NQS administrator, who does not need an active category. For more information on nqsdaemon, see the nqsdaemon(8) man page.
The following changes occur in NQS with the MAC security policy:
The following directories are either labeled as multilevel directories (MLDs) or are wildcard directories as defined in your nqeinfo file:
NQS_SPOOL/private/root/control
NQS_SPOOL/private/root/data
NQS_SPOOL/private/root/failed
NQS_SPOOL/private/root/interproc
NQS_SPOOL/private/root/output
NQS_SPOOL/private/root/chkpnt
NQS_SPOOL/private/requests
NQS_SPOOL/private/reconnect
If UNICOS or UNICOS/mk is configured to support the syslow and syshigh MAC labels, the log file, console file, and daemons file are labeled system high (syshigh); all other NQS database files are labeled system low (syslow).
The client processes access the NQS protocol named pipe through the privilege mechanism (or the pipe is labeled as wildcarded).
Enforcement of the MAC rules for status and delete operations is configurable; NQS administrators bypass this enforcement.
The MAC rules for writing to job output files are enforced; NQS administrators bypass this enforcement. For more information on writing messages to job output files, see the qmsg(1) man page.
Output files are labeled at the job execution label.
Mail is sent at the job label using the following conditions:
If the job is queued, mail is sent at the job submission label.
If the job has been initiated or has completed, the mail is sent at the job execution label.
The socket security attributes at job acceptance are used to determine the session security attributes for jobs that are submitted remotely; this constrains the user's UDB security attributes.
Session security attributes are reverified at job restart; if constraints have changed, the job may be deleted instead of restarted.
Discretionary access controls (DACs) are rules that control and limit access to an object, based on an identified individual's need to know and without intervention by a security officer for each individual assignment.
This is accomplished by using standard mode permission bits and an access control list (ACL); the ACL and mode bits allow the owner of a file to control r/w/x access to that file. The owner of the file can create or modify an ACL that contains the identifiers and the r/w/x permissions for those individuals and groups that will be allowed to access the file.
The mandatory policy restrictions established by the security administrator always govern object access.
For more information on how ACLs are used and for examples that show how to create and maintain ACLs, see the spacl(1) man page and your UNICOS or UNICOS/mk system administration documentation.
When DAC is used on a secure system, the following changes occur in NQS:
You must explicitly add root to the queue access lists (this is also true on non-multilevel security systems).
Any current NQS manager or user who is authorized through the PAL (see “System Management”) can add NQS administrators.
The identification and authentication (I&A) feature describes the login and password features used on a UNICOS or UNICOS/mk system that is running multilevel security. When I&A is used on a secure system, the following changes occur in NQS:
Centralized authentication is used to authenticate remote and alternative users (this is also true on UNICOS 9.0 non-multilevel security systems).
When the NQS validation type is file, and the system is configured for strict B1 compliance, the effect on the alternative user is as follows:
The remote user must be the same as the local user.
The remote host must be in the /etc/hosts.equiv file.
The remote host must be in the local user's .rhosts/.nqshosts file.
To provide alternative user capability for NQS when it is configured for file validation on a system that has multilevel security enabled, you must add the NETW_RCMD_COMPAT configuration parameter to the SECURE_NET_OPTIONS macro in the system config.h file before building and installing the kernel.
The WAL is checked for remote job queuing, remote status, and remote signal/deletion events to see whether the user is allowed access to NQS.
![]() | Note: For UNICOS 10.0 and UNICOS/mk systems, sites can run with PRIV_SU and PALs or PAL-only Trusted UNICOS or PAL-only Cray ML-Safe. |
All NQS user commands are labeled as trusted processes because they may write to the syshigh labeled log file; this is when SECURE_MAC is turned on. The privcmd(8) command is used to assign PALs, and MAC and DAC labels.
When using the super-user mechanism (PRIV_SU), only root can change the user validation type.
When using the PAL privilege mechanism, the chgm privilege text (for an active secadm or sysadm category) is used to determine who can add, delete, and set managers, and to change the user validation type. To change the user validation type, the caller must be an NQS manager.
The site can define the security auditing policy for a UNICOS or UNICOS/mk system on which the multilevel security feature is enabled. The site policy should be determined before the system is up and running, and it should be applied consistently at all times. Consistent and proper use of the the multilevel security auditing features will help ensure site security.
For information on producing the NQS security log records in the multilevel security feature, see your UNICOS or UNICOS/mk system administration documentation.
The audit records specific to NQS are SLG_NQS and SLG_NQSCF. Whenever an NQS delete request is made, the SLG_NQS record is produced. These entries are logged through the slgentry(2) system call. Whenever a security-related change is made to the configuration of NQS, the SLG_NQSCF record is produced. These entries are logged through the slgentry(2) system call.
To enable auditing, you can use either the /etc/spaudit -e record_type (record_type is either nqs or nqscf) command or the UNICOS Installation/Configuration Menu System (on UNICOS systems) or the system configuration files (on UNICOS/mk systems). For more information on auditing the multilevel security feature on a UNICOS or UNICOS/mk system, see your UNICOS or UNICOS/mk system administration documentation and the spaudit(8) man page. The following security-related events are audited by the NQS audit records, or the SLG_LOGN or SLG_TRUST audit records:
User authentication success/failure (SLG_LOGN)
Failure to set job process label (SLG_LOGN)
Job deletion (SLG_NQS)
Queue access list insertions and deletions (SLG_NQSCF)
User validation type changes (SLG_NQSCF)
Add, delete, and set managers and operators (SLG_NQSCF)
Attempt to execute qmgr subcommands by a nonprivileged user (SLG_NQSCF)
Bypass of the MAC rules enforcement by NQS administrators (SLG_TRUST)
The configurable multilevel security features for NQS on UNICOS or UNICOS/mk systems, which are described in “Security Policies”, are disabled by default. To enable these features, you must make the following configuration changes.
![]() | Note: To determine if these policies are supported on your system, see your UNICOS or UNICOS/mk system administration documentation. |
Enforce the MAC rules for status and delete requests by setting the NQE_NQS_MAC_COMMAND variable to 1 in the nqeinfo file.
Use MLDs instead of wildcard-labeled spool directories by setting the NQE_NQS_MAC_DIRECTORY variable to 1 in the nqeinfo file.
If the NQS spool directories already exist, see “NQS Multilevel Directories (MLDs)”, for conversion instructions.
![]() | Note: To execute the mlmkdir(8) command, which is used to create MLDs, special authorization is required. To create or convert to MLDs in the NQS spool area, you must be root and must have the following additional authorizations, depending on your system configuration: PAL-based (Active secadm category) and PRIV_SU (No additional privilege). |
Set the NQE_NQS_PRIV_FIFO variable to 1 in the nqeinfo file. This action enforces the use of privilege through PALs for client processes, such as qsub, by writing over the NQS protocol named pipe and the NQS log pipe.
Before starting NQS, see NQE Installation, publication SG-5236, appendix A, Preparing a Node to Run NQE. For more information, see the privcmd(8) man page.
![]() | Warning: If your system is configured to enforce the syshigh and syslow security labels, the NQS spool directories and the NQS log file and console file must be on a file system that has syshigh and syslow in its label range.
On UNICOS systems, ensure that the UNICOS installation tool menu is set as follows:
|
![]() | Note: To use the mlmkdir(8) command, which is used to create MLDs, special authorization is required. To create or convert to MLDs in the NQS spool area, you must be root and must have the following additional authorizations, depending on your system configuration:
|
To convert from MLDs to wildcard-labeled directories, use the following procedure:
Ensure that the nqsdaemon is not running while the conversion is occurring. For more information on how to shut down NQS, see the qstop(8) man page. For more information on nqsdaemon, see the nqsdaemon(8) man page.
As root, activate the secadm category, if needed, by entering the setucat secadm command.
Change to the directory that you want to convert; in this example, change to the NQS_SPOOL/private/root directory.
Remove the multilevel symbolic link control, so that a name can be used with the cvtmldir(8) command to create the directory that will be labeled as a wildcard directory by entering the following commands:
/bin/ln -m ./control.mld ./control.temp /etc/unlink ./control |
The control.temp directory is now a symbolic link pointing to .control/.mld.
Create a multilevel symbolic link called NQS_SPOOL/private/root/control, which points to a directory named NQS_SPOOL/private/root/control.mld, by entering the following command:
/etc/cvtmldir -w $NQS_SPOOL/private /root/control.temp $NQS_SPOOL/private/root/control |
The files in the control.mld directory are linked or copied, when necessary, into the new control directory. The files are not deleted from the control.mld directory.
Set the NQE_NQS_MAC_DIRECTORY parameter value to 0 in the nqeinfo file. You must restart NQS with these values.
The administrator should ensure that the directories were converted successfully. You should not delete the control.temp directory and its files until NQS has been restarted and the jobs are requeued or restarted successfully. You must repeat the preceding steps for each NQS directory that is listed at the beginning of ???.
On UNICOS and UNICOS/mk systems, you may make local changes to NQS. For Cray PVP systems, you may make changes to NQS by using user exits and by making source-code modifications. For UNICOS/mk systems, you may make changes to NQS only by using user exits.
For UNICOS and UNICOS/mk systems that support the user exit feature, the NQS user exits (which are located in the /usr/src/lib/libuex directory) let you introduce local code at defined points (compatible from release to release) to customize NQS without having access to the source.
The user exits let sites tailor various functions of NQS, including queue destination selection, qsub(1) option preprocessing, job startup and termination processing, and NQS startup and shutdown processing.
NQS user exits are used for the following functions:
NQS daemon packet. Lets sites add functionality when a packet arrives.
NQS destination ordering. Lets sites control the order of pipe queue destinations within NQS.
NQS fair-share priority calculation for the share_pri value.
NQS job selection. Lets sites determine whether the request chosen by NQS should be started, and lets them customize the order in which requests are started by NQS.
NQS job initiation. Allows a user exit before job initiation.
NQS job termination. Allows a user exit after job termination.
NQS qmgr(8) command. Allows additional functionality when a qmgr command will be processed.
qsub directives. Allows user exits before the first #QSUB directive, on each #QSUB directive, and after the last #QSUB directive.
NQS startup. Lets sites perform processing during the NQS startup.
NQS shutdown. Lets sites perform processing during the NQS shutdown.
Job submission. NQS uses the centralized user password identification and authentication (I&A) routines. The user exits that are a part of the new validation routines allow sites to implement their own NQS user identification and validation algorithms. For more information on the I&A user exits in the multilevel security feature, see your UNICOS or UNICOS/mk system administration documentation.
To use the user exits, follow these steps:
![]() | Warning: The NQE /opt header files are not available in a chroot environment. If you create an NQS user exit which references header files in /nqebase/nqe/.../ and then build UNICOS, the libuex.a library build described in the steps below will fail. You must keep your UNICOS build and your build for NQS user exits separate. |
Copy the user exit template file (or the site's customized file) to /usr/src/lib/libuex/local (for example, copy nqs_uex_jobselect.template to local/nqs_uex_jobselect.c).
Ensure that the path names to the NQS header files (.h) are correct.
Execute the nmakefile file by using the nmake install command in the /usr/src/lib/libuex directory to build the libuex.a library. This library must be rebuilt whenever a user exit is modified, added to, or deleted from libuex/local.
Execute the NQS nmakefile file by using the /etc/build_nqs script in the /nqebase/$NQE_VERSION directory to rebuild NQS.
To disable a user exit, you must remove (or rename) your local user exit file (for example, local/nqs_uex_jobselect.c) and repeat steps 3 and 4 to rebuild libuex.a and NQS with the default user exit stub. For examples on how to code the NQS user exits, see the /usr/src/lib/libuex directory.
The NQS user exits are described as follows:
User exit | Description | |
nqs_uex_dorder | Lets sites change the destination list of a pipe client just before it tries to send a request to that list of destinations. To configure the pipe queue destinations, use the qmgr create pipe_queue and qmgr add destinations commands. The order of the pipe queue destinations determines the order in which they are contacted. If the configured list of destinations is empty, this user exit is not called. A return code of 0 is the only valid return code and indicates successful completion of this user exit; any other return code is treated as if it were a 0. | |
nqs_uex_jobinit | Lets sites provide additional functionality when a job is initiated. A site can use this exit to set additional environment variables. A return code of 0 is the only valid return code and indicates successful completion of this user exit; any other return code is treated as if it were a 0. | |
nqs_uex_jobselect | Determines whether the request chosen by NQS should be started. You can customize the order in which NQS starts requests. A return code of 0 indicates that normal NQS scheduling should occur. A return code of 1 indicates that the job should not be started. A return code of 2 indicates that the job should be started. | |
nqs_uex_jobterm | Lets sites provide additional functionality when a job is terminated. A return code of 0 is the only valid return code; any other return code is treated as if it were a 0. | |
nqs_uex_packet | Lets sites provide additional functionality when specific NQS packets arrive in the daemon. A user exit is required in the NQS daemon each time a packet arrives. A return code of 0 indicates that NQS should process this packet. A return code of 1 indicates that NQS should ignore this packet. | |
nqs_uex_qmgr | Lets sites provide additional functionality when a qmgr subcommand is entered, providing control of the operator functions and the customization of requests. A return code of 0 indicates that NQS should process this command. A return code of 1 indicates that NQS should ignore this command. | |
nqs_uex_qsub_after | Lets sites provide additional functionality after all qsub options have been processed. This user exit executes under the user's process. A return code of 0 indicates that NQS should continue processing this request. A return code of 1 indicates that NQS should discard this request. | |
nqs_uex_qsub_before | Lets sites provide additional functionality before qsub handles the first QSUB directive. This user exit executes under the user's process. A return code of 0 indicates that NQS should continue processing this request. A return code of 1 indicates that NQS should discard this request. | |
nqs_uex_qsub_each | Lets sites interrogate or change each QSUB directive that is processed by QSUB. This user exit executes under the user's process. A return code of 0 indicates that NQS should continue processing this request. A return code of 1 indicates that NQS should discard this request. | |
nqs_uex_shrpri | Lets sites provide their own calculation for the share_pri value when the fair-share scheduler is active. The share_pri value is used in the priority calculation for job scheduling. A return code of 0 indicates that the share_pri value generated by NQS should be used. A nonzero return code indicates that the share_pri value from this user exit should be used. | |
nqs_uex_shutdown | Lets sites do additional processing when the NQS daemon is terminating. A return code of 0 is the only valid return code; any other return code is treated as if it were a 0. | |
nqs_uex_start | Lets sites do additional processing when the NQS daemon is initializing. A return code of 0 indicates that NQS should continue the startup; a return code of 1 indicates that NQS should abort the startup. |
For Cray PVP systems, you may make changes to NQS by making source-code modifications. The source code is provided when you receive NQS; it is located in /nqebase/src.
If you wish to modify NQS by changing the source-code, make a backup copy of the content of the /nqebase directory, and then modify the source as desired.
Rebuild NQS by using the /etc/build_nqs script in the /nqebase/$NQE_VERSION directory.