This chapter describes how to submit batch requests to NQS. It discusses the following topics:
Submitting batch requests:
Using the NQE GUI to submit requests (“Using the NQE GUI to Submit Requests”)
Using the command line interface to submit requests (“Using the Command Line Interface to Submit Requests”)
Using the NLB default queue for submitting requests (“Using the NLB Default Queue for Submitting Requests”)
Submitting a request to the NQE database:
Specifying a database user name for your request (“Specifying a Database User Name for Your Request”)
Directing your request to the NQE database (“Directing Your Request to the NQE Database”)
Using DCE/DFS when submitting requests to NQE (“Using DCE/DFS When Submitting Requests to NQE”)
Using security labels when submitting requests (“Using Security Labels When Submitting Requests”)
Security label for NQS requests submitted locally (“Security Label for NQS Requests Submitted Locally”)
Security label for NQS submitted remotely (“Security Label for NQS Requests Submitted Remotely”)
Using request attributes:
Setting request attributes (“Setting Request Attributes”)
Using request attributes with NQS (“Using Request Attributes with NQS”)
Using request attributes with the NLB (“Using Request Attributes with the NLB”)
Using request attributes with the NQE scheduler (“Using Request Attributes with the NQE Scheduler”)
Successful submissions:
Successful submission to NQS (“Successful Submissions to NQS”)
Successful submissions to the NQE database (“Successful Submissions to the NQE Database”)
Suppressing informational messages (“Suppressing Informational Messages”)
Unsuccessful submissions
Unsuccessful submissions to NQS (“Unsuccessful Submissions to NQS”)
Unsuccessful submissions to the NQE database (“Unsuccessful Submissions to the NQE Database”)
NQS system limits (“NQS System Limits”)
Using NQS mail (“Using NQS Mail”)
Accessing data files (“Accessing Data Files”)
Obtaining job accounting (“Obtaining Job Accounting”)
NQS error messages (“Error Messages”)
Recovery and restart (“Recovery and Restart”)
Using the request /tmp directory (“Using the Request /tmp Directory”)
This chapter describes simple request submission; however, over 40 options are available. Generally, you will use some of these options to customize your request so that it executes most effectively.
See Chapter 5, “Customizing Requests”, for a description of commonly used options; “Using Alternative User Names” in Chapter 5, describes how to specify that a request be run under another user's name.
See Chapter 7, “Communicating with Requests”, for a description of how you can communicate with your request while it is executing. See Chapter 6, “Working with Output Files”, for a description of how output files are returned to you and the options you have to control their location and content.
You also can submit DCE work when using NQE; for specific information, see “Using DCE/DFS When Submitting Requests to NQE”.
You can submit a request by using either the NQE graphical user interface (GUI) or the command line interface commands cqsub or qsub. For a summary of the NQE GUI options, see the nqe(1) man page. For a full description of the cqsub and qsub command options, see the cqsub(1) and qsub(1) man pages.
You can submit a request to either NQS or to the NQE database. By default, your request is submitted to NQS. To submit your request to the NQE database, see “Submitting a Request to the NQE Database”.
![]() | Note: To use NQE, you must ensure that environment variables are set as described in Chapter 2, “Preparing to Use NQE”. You also must ensure that validation and NQE database authorization requirements are in place as described in Chapter 2, “Preparing to Use NQE”. |
Your system administrator defines the default NQS and NQE database servers and whether your requests will be directed, by default, to NQS or to the NQE database. You can use the default servers and destination, or you can override them.
You can submit requests from your client in the following ways:
Use the NQE GUI Submit window (to invoke the NQE GUI, execute the nqe command).
Use the command line interface cqsub options command.
Use the command line interface qsub options command.
How you override a default node or destination depends on how you submit your request. The following sections describe how to submit requests, including how to override these defaults.
![]() | Note: If you submit a request file as a batch request, after you successfully submit it, you can modify the original file without affecting the request that was submitted. |
When you submit a request to NQS, and the UNICOS multilevel security (MLS) feature or UNICOS/mk security enhancements are enabled on a remote system, you cannot submit the request to a remote host if that remote host has a workstation access list (WAL) entry for the host of origin that restricts your access to NQS services.
If your NQS requests are on a system that has the UNICOS MLS feature or UNICOS/mk security enhancements enabled, you should execute within your security environment, as defined in the user database (UDB) and the network access list (NAL). See “Using Security Labels When Submitting Requests”, for more information.
To submit a request to NQE, access the NQE GUI by entering the nqe command. The initial NQE GUI button bar window will appear (as shown in Figure 1-5). Using the left mouse button, click once on the Submit button.
![]() | Note: The mouse button settings described in this guide are the default settings. |
The following Submit window will appear:
The Submit window is composed of four sections: the menu bar, the Job to submit: line (the job script file), the job edit area, and the actions button bar.
To submit an existing job script file, do the following:
Either enter the path name of the request file on the Job to submit: line and press the RETURN key, or select Open from the File menu (the Open option uses a standard Motif file selection interface) and select the request file you want to submit. The request file text appears in the job edit area below the Job to submit line.
You can modify the content of your request file in the job edit area of the Submit window. To save changes for future use, select Save or Save As on the File menu.
You can submit a request directly to NQS or to the NQE database. To set the destination for your request, select either the Submit to NQE or Submit to NQS on the General Options menu, and apply the change. If you do not select this option, the value of the NQE_DEST_TYPE environment variable is used, which you can set to be either nqs or nqedb; otherwise, the value of NQE_DEST_TYPE that is set in the /etc/nqeinfo file on your NQS_SERVER is used.
For additional information about customizing your environment by setting specific environment variables, see Chapter 5, “Customizing Requests”.
For information about using the NLB and default queue when submitting a request to NQS, see “Using the NLB Default Queue for Submitting Requests”.
For information about submitting a request to the NQE database, see “Submitting a Request to the NQE Database”.
For information about using request attributes when submitting a request, see “Using Request Attributes”.
For information about output options you may want to set before you submit a request, see Chapter 6, “Working with Output Files”.
If your request will run on a host that uses password validation, use the Actions menu to enter your password.
To submit the request file, click on the Submit button that is located at the bottom of the Submit window.
If your request is submitted successfully, you will receive a message similar to one of the following messages:
For requests submitted to NQS, you will receive the following message:
Request number.host submitted to queue:queue. |
For requests submitted to the NQE database, you will receive the following message:
Task id tnumber inserted into database nqedb. |
For more information about messages received after submitting a request, see “Successful Submissions”, and “Unsuccessful Submissions”. For information about suppressing this message, see “Suppressing Informational Messages”.
To cancel the Submit window, click on the Cancel button that is located at the bottom of the Submit window.
To use the command line interface to submit a batch request to NQE, use the cqsub or qsub command. For a complete list of the cqsub and qsub command options, see the cqsub(1) and qsub(1) man pages.
Simple forms of the cqsub and qsub commands are as follows:
cqsub [file] |
qsub [file] |
The file argument is the name of the job script file to be submitted to NQE for execution.
To submit a request directly to NQS, use the cqsub or qsub command. To submit a request to the NQE database and scheduler, you can use only the cqsub command. To set the destination for your request, use the -d dest_type option; dest_type can be nqs or nqedb. If you do not use the -d dest_type option, the value of the NQE_DEST_TYPE environment variable is used, which you can set to be either nqs or nqedb; otherwise, the value of NQE_DEST_TYPE that is set in the /etc/nqeinfo file on your NQS_SERVER is used.
For more information about customizing your environment by setting specific environment variables, see Chapter 5, “Customizing Requests”.
For information about using the NLB and default queue when submitting a request to NQS, see “Using the NLB Default Queue for Submitting Requests”.
For information about submitting a request to the NQE database, see “Submitting a Request to the NQE Database”.
For information about using request attributes when submitting a request, see “Using Request Attributes”.
For information about output options you may want to set before you submit a request, see Chapter 6, “Working with Output Files”.
If your request is submitted successfully, you will receive a message similar to one of the following messages:
For requests submitted to NQS, you will receive the following message:
Request number.host submitted to queue:queue. |
For requests submitted to the NQE database, you will receive the following message:
Task id tnumber inserted into database nqedb. |
For more information about messages received after submitting a request, see “Successful Submissions”, and “Unsuccessful Submissions”. For information about suppressing this message, see “Suppressing Informational Messages”.
After you issue the cqsub or qsub command, the normal shell prompt appears. You can continue using this session for other purposes. Or, if you want, you can log off. Your request will execute if you are not logged on.
The NLB lets NQE balance the workload of requests across multiple NQS servers. This process is called load balancing. To use load balancing, you must submit requests to the destination-selection queue nqenlb, which is the system default destination-selection queue (but may be configured differently for your site). Usually, an NLB pipe queue named nqenlb exists on each NQS server node.
You can let NQS select a queue for you, or you can specify a queue in which you know you want your request to execute. If you do not specify a queue, NQS initially sends your request to a default queue. The default queue is usually a destination-selection queue. The queue nqenlb is the system default destination-selection queue (but may be configured differently for your site).
When you submit a request to a destination-selection queue, NQS queries the NLB to find the most appropriate batch queue to receive the request, based on site-defined policies. The NLB returns a list of the most appropriate queues. The list is ordered beginning with the most appropriate queue choice.
NQS tries to forward the request to the first queue on the list and continues until the request is accepted or until the list is exhausted.
If your site uses file validation, you must have an .rhosts or .nqshosts file on each system to which a request may be load-balanced. If you receive the No account authorization at transaction peer message, you probably must edit your validation files. For more information on validation, see Chapter 2, “Preparing to Use NQE”.
The following example submits a request to be load-balanced when the default queue is a destination-selection queue:
cqsub jobfile1 |
The jobfile1 file is submitted to NQS, which queries the NLB and then forwards it to a batch queue.
To determine the name of a destination-selection queue, use the cqstatl -p or qstatl -p command. If nothing is listed in the DESTINATIONS column, the queue is a destination-selection queue.
You might not have a default queue; to specify one, see “Unsuccessful Submissions”.
A request submitted to the NQE database is called a task, and it is assigned a unique task identifier (tid). When you submit a request to the NQE database, the NQE database works with an administrator-defined NQE scheduler to analyze all aspects of your request and determine which NQS server node will receive and process the request.
When the NQE scheduler has chosen a node for your request, the NQE database sends a copy of the request to the batch queue (by default) on the selected NQS server node. The default batch queue is usually named nqebatch. The request is assigned an NQS request identifier (requestid) and is executed.
Because the original request remains in the NQE database, using the NQE database provides a cluster-wide job rerun capability. If the cluster rerun feature is enabled on your NQE cluster, and if a problem occurs during execution and the copy of the request is lost, a new copy can be submitted.
Your system administrator can modify the NQE scheduler to better meet user requirements and system use needs. For example, the scheduler could be modified so that you could indicate a certain amount of CPU units for your request, and the scheduler can interpret units differently by machine types in your group of NQS server nodes in the NQE cluster.
To submit a request to the NQE database, to control a request that was sent to the NQE database, or to get a status of a request that is in the NQE database, you must have a database user dbuser account (name) with the proper authorization (user privileges). This database user name may be the same as or different from your UNIX or UNICOS login on the client host. Your NQE administrator controls who has access to the database and from which client host.
If the database user name is different from your local client's login name, you must supply the database user name on each of your client requests.
![]() | Note: To delete, to send a signal to, or to get a status of a request, you must have the same database user name that was used to submit the request to the NQE database. |
Specify the database user name in the following ways:
If you use the NQE GUI, select the General Options menu in the Submit window and enter the name in the User Name option field.
If you use the command line interface, use the -u dbuser= command option of the cqsub command. An example follows:
cqsub -u dbuser=henry job |
Set the NQEDB_USER environment variable. An example follows:
setenv NQEDB_USER henry |
To specify that a request be run under another user's name, see “Using Alternative User Names” in Chapter 5.
To specify that you want your request to the NQE database, you may use one of the following three methods:
Set the NQE_DEST_TYPE environment variable. An example follows:
setenv NQE_DEST_TYPE nqedb |
If you use the NQE GUI, select the General Options menu in the Submit window. Select Submit to NQE and apply your change.
![]() | Note: If you have set the NQE_DEST_TYPE environment variable to be nqedb, you do not need to select the Submit to NQE option. The selected NQE GUI option overrides the NQE_DEST_TYPE environment variable or the etc/nqeinfo file variable setting. |
If you use the command line interface, use the -d dest_type command option of the cqsub command, and specify nqedb as the dest_type (destination type). An example follows:
cqsub -d nqedb job |
![]() | Note: If you have set the NQE_DEST_TYPE environment variable to be nqedb, do not use the -d dest_type option on the command line. The command line option overrides the NQE_DEST_TYPE environment variable or the etc/nqeinfo file variable setting. |
When using the Distributed Computing Environment/Distributed File Service (DCE/DFS) to submit requests to NQE, you should note the following:
The NQE DCE/DFS feature is restricted to operating within a single DCE cell.
Ticket forwarding is dependent upon the use of the NQE database when submitting tasks. If ticket forwarding is not supported, you must provide a password with the job request if DCE credentials are desired.
Support for tasks that use forwarded tickets for DCE authentication is provided only on UNICOS, UNICOS/mk, IRIX, and Solaris platforms. Support for tasks that use a password for DCE authentication is available on all NQE 3.3 (or later) platforms. Tickets may be forwarded from any NQE 3.3 (or later) client to any NQE 3.3 (or later) server that supports forwarded tickets as a means of DCE authentication. (Ticket forwarding for either clients or servers is not supported for the Digital UNIX platform in the NQE 3.3 release.)
NQE supports only the Open Software Foundation (OSF) DCE version 1.1.
Kerberos is the authentication component of DCE.
The user password supplied must be the same for both UNIX or UNICOS and the DCE registry password for that user. A user may provide a different DCE registry user name when submitting a request by using the Submit -> General Options -> User Name field or by using the cqsub -u or qsub -u command. The user password cannot contain more than 8 characters.
Your home directory can be within DFS space on both UNIX and UNICOS platforms. The request script file can be within DFS space.
NQS supports DFS path name formats so request output files can be returned to DFS. For information about DFS support for output files, see Chapter 6, “Working with Output Files”.
After a request completes, NQS uses kdestroy to destroy any credentials obtained by NQS on behalf of the request owner.
On UNIX platforms, there is not an integrated login system feature available. NQS on UNIX platforms obtains separate DCE credentials for request output return. Therefore, including a kdestroy within a request script file running on an NQE UNIX server will not affect the return of request output files into DFS space.
![]() | Caution: Including the kdestroy command within a request script file on UNICOS systems will destroy the credentials obtained by NQS and prevent NQS from returning request output files into DFS space. |
Failure to obtain DCE credentials results in a nonfatal error. The request will be initiated even if the attempt to obtain DCE credentials for the request owner fails. If DCE credentials are successfully obtained, the KRB5CCNAME environment variable is set within the request process that is initiated.
You can use the klist command within a request script file to verify that DCE credentials were obtained.
![]() | Note: A restarted job correctly gets the new credentials obtained from NQS, but the KRB5CCNAME environment variable within the restart file is not reset to the new cache file name. After the job is restarted, a klist within the job script will incorrectly state that there are no credentials. As a result, DCE services are affected but not DFS, which continues to work with the new credentials. |
If you have NQS requests that are on a system that has the UNICOS multilevel security (MLS) feature or UNICOS/mk security enhancements enabled, you must execute requests within your security environment as defined in the user database (UDB) and the network access list (NAL). This security environment remains the same for the duration of the request; that is, you cannot place the setucmp(1) or setulvl(1) command in the NQS request to change the environment. The request's security label is determined by your active security label or is set as specified by using either the cqsub -L or qsub -L command or the cqsub -C or qsub -C command, or by selecting the NQE GUI General Options option on the Configure menu of the Submit window. To specify the active security level of the request, use the -L option; to specify compartments that must be a superset of your active compartments, use the -C option.
This section describes how security labels are determined when you submit NQS requests locally and remotely.
The security label of a locally submitted NQS request is determined as follows:
If you submit the request by using the cqsub or qsub command or the NQE GUI Submit window (without specifying the -L or -C option), the request is assigned your active security label when cqsub or qsub executes.
When using the cqsub -L or qsub -L command or specifying an active security level in the General Options window, if you specify a security level lower than your active security level, the request is not queued.
When using the cqsub -C or qsub -C command or specifying an active security compartment in the General Options window, the request is assigned any compartments specified by the command. The specified compartments must be part of your authorized set and must dominate your active compartment set.
In the following example, the user tries to submit a request that has a security level of 0; the active security level is 1. The job is not queued.
$ setulvl 1 setulvl: New security label is Level[1:level1] Compartments[none] $ qsub -L 0 -eo -o nqs.out request.file Unanticipated transaction failure at local host. |
The security label of a remotely submitted NQS request is determined as follows:
If you submit the request by using the cqsub or qsub command or the NQE GUI Submit window (without specifying the -L or -C option), the request is assigned your default security label as defined in the UDB.
When using the cqsub -L or qsub -L command or specifying an active security level in the General Options window, if you specify a security level lower than your submission security level, the request is not executed.
When using the cqsub -C or qsub -C command or specifying an active security compartment in the General Options window, the request is assigned any compartments specified by the command. The specified compartments must be part of your authorized set and must be a superset of your submission compartment set.
Request attributes are ASCII strings that are to be associated with your request and may be interpreted by the NQE scheduler, the NLB, or NQS.
You may specify zero or more request attributes when submitting a request.
To provide a more efficient or site-specific scheduling, your NQE administrator can configure the NQE database, the NLB, and NQS to interpret the attribute names (and values). This section describes using request attributes.
Attributes have the following format:
attribute_name [=value] [,attribute_name [=value] |
You can set request attributes in the following ways:
In the NQE GUI by selecting General Options on the Configure menu of the Submit window. Enter the desired attributes (for example nastran), and apply your changes and cancel the window. To save the changes, select Save Current Job Profile on the Configure menu of the Submit window.
On a cqsub command line by using the -la option. An example follows:
cqsub -la "nastran" jobfile |
For a request being submitted to the NQE database, an example follows:
cqsub -d nqedb -la "nastran=2" jobfile |
The license attribute has the value 2. The attributes are stored in the NQE database with other request information.
The NQE scheduler can use (or ignore) the attributes. If an attribute does not have a value, only the attribute name is stored in the NQE database.
In a request file by using the # QSUB -la option, as follows:
# QSUB -la guassian |
In an NQSATTR environment variable, as follows:
export NQSATTR="any_scalar_system" cqsub jobfile |
You may specify a list of request attributes when you submit a request to NQS. For example, the following submits the request with script jobfile to NQS with the request attributes nastran and big:
cqsub -la "nastran,big" jobfile |
If you use the NQE GUI to submit a request, select General Options of the Configure menu in the Submit window. Enter the attributes in the Attribute(s) field and apply your change.
![]() | Note: Your administrator can configure NQS pipe queues and batch queues to reject or accept requests with certain attributes. This gives greater control over the types of request run at a given time. |
You may specify a list of request attributes when you submit a request to an NLB queue in NQS. For example, the following submits the request with script jobfile to the NLB queue, nqenlb, with the request attributes nastran and big:
cqsub -q nqenlb -la "nastran,big" jobfile |
If you use the NQE GUI to submit a request, select General Options of the Configure menu in the Submit window. Enter the attributes in the Attribute(s) field and apply your change.
NQS passes the request attributes to the NLB, which can return a list of destinations based on the attributes as well as system load.
![]() | Note: If an attribute is not defined in NLB, NQS ignores it. No error message is generated. If the attribute is defined but is not an integer type, NQS generates a log message and the attribute is ignored. |
You may specify not only a list of request attributes but also values associated with the attributes when submitting a request to the NQE database. For example, the following offers an attribute called nastran and an attribute called license, which has a value of 2:
cqsub -d nqedb -la "nastran,license=2" jobfile |
If you use the NQE GUI to submit a request, select General Options of the Configure menu in the Submit window. Enter the attributes in the Attribute(s) field and apply your change.
The license attribute has the value 2. The attributes are stored in the NQE database with other request information.
The NQE scheduler can use (or ignore) the attributes. If an attribute does not have a value, only the attribute name is stored in the NQE database.
![]() | Note: Request attributes with explicit values (such as license=2) are completely ignored by both NQS and NLB. |
After your request has been submitted successfully, you will receive a message; the message you receive will depend on which destination you chose for your request (NQS or NQE database). The following sections describe the message you receive.
When your batch request has been submitted successfully to NQS, you will receive one of the following messages:
Request number.host submitted to queue:queue. |
nqs-181 qsub: INFO Request number.host: Submitted to queue queue by username(userid). |
This message tells you the following:
Your request was submitted successfully. If your request cannot enter a queue, you receive an error message.
Your request received request identifier number.host. NQS assigns the unique number in sequential order. The host is the name of the NQS server that initially processes your request. You can use the request ID to locate or even delete your request.
Your request entered the specified queue. If you do not specify a queue name (see “Specifying an NQS Queue” in Chapter 5), your request is submitted to a default queue. From the default queue, your requests usually go to another queue.
If you used the qsub command to submit your request, this message also displays the username, which identifies the name of the NQS user who initially submitted the request, and the userid, which identifies the user ID of the NQS user who initially submitted the request.
In the following example, a submitted batch request is assigned a request ID of 255.coal, and it is sent to an NQS queue called nqenlb:
Request 255.coal submitted to queue: nqenlb. |
In this example, the NQS server initially processing the request is called coal.
When your batch request has been submitted successfully to the NQS database, you will receive the following message:
Task id tnumber inserted into database nqedb. |
This message tells you the following:
Your request was submitted successfully into the database. If your request cannot enter a queue, you receive an error message.
Your request received task identifier t number. The NQE database assigns the unique number in sequential order. You can use the task ID to locate or even delete your request.
In the following example, a submitted batch request is assigned a task ID of t4, and it is sent to the default NQE database, which is called nqedb:
Task id t4 inserted into database nqedb. |
To suppress the informational message when you submit a request, do one of the following:
Use the NQE GUI Submit window and select General Options on the Configure menu. Ensure the Submit Silently option is selected, and apply your change (click on the Apply button) if you need to select it. To save your changes, select Save Current Job Profile on the Configure menu.
Use the cqsub -z or the qsub -z command.
After you issue the cqsub or qsub command, the usual shell prompt for your session appears. You can continue using this session for other purposes. Or, if you want, you can log off (the request does not require you to be logged on to execute).
![]() | Note: NQS copies the file when you submit it and uses this copy as the batch request. After you have successfully submitted a file as a batch request, you can modify the original file without affecting the submitted request. |
If your request is not submitted successfully, you receive an error message that describes the problem. The message you receive will depend on which destination you chose for your request (NQS or NQE database). The following sections describe the message you receive.
For more information about solving problems with request submissions, see Chapter 16, “Solving Problems”.
Common reasons for unsuccessful submission to NQS are as follows:
NQS is not executing. You may receive a message such as the following:
Retrying connection to NQS_SERVER host ice (111.111.11.11) ... Retrying connection to NQS_SERVER on host ice (111.111.11.11) ... QUESRV:ERROR: Failed to connnect to NQS_SERVER at ice NETWORK:ERROR: NQS network daemon not responding. |
For further advice, contact your local system support staff.
No default NQS queue is defined. You may receive one of the following messages:
CQSUB:ERROR: Failed to submit request to default queue at server ice. QUESRV: ERROR: No default queue is defined at transaction peer. |
nqs-1019 qsub: CAUTION No request queue specified, and no local default has been defined. nqs-1045 qsub: WARNING Request not queued. |
To define a default queue, do one of the following actions:
Ask your NQE administrator to define a default queue.
Define your own default queue by setting the QSUB_QUEUE environment variable, depending on the shell you are using, as follows:
Users of csh or tcsh can use the following command:
setenv QSUB_QUEUE queue-name |
To obtain a list of queue names, use the cqstatl command.
To change an environment variable, you can use the unset command if you are using the sh or ksh shell, or the unsetenv command if you are using the csh or tcsh shell.
Alternatively, you can explicitly specify a queue by using the NQE GUI or the command line interface (cqsub -q or qsub -q command). You can use the -q option only to specify a queue that your NQE administrator has defined as being able to accept requests directly.
For more information, see “Specifying an NQS Queue” in Chapter 5.
Access denied at local host. Your NQE administrator can define queues so that they cannot accept requests directly. If this has been done for the queue you specify, your request will be rejected.
If you are using a client command to talk to the NQE database, and the database server is not up or does not exist, you will see a message such as the following:
Connect: Connection refused NETWORK: ERROR: NQE Database connection failure: Can't connect to MSQL server on Latte |
A client trying to connect to the database without the proper validation will result in an error such as the following:
latte$ cqstatl NETWORK: ERROR: NQE Database connection failure: Connection disallowed. latte$ |
To determine why you cannot connect, check with your database administrator.
Your NQE system administrator can set limits on the NQS system that constrain the maximum number of requests that can be processed concurrently. NQS system limits are important to you because they may affect when (or if) your request will run. For example, if you submit five requests in succession, the last requests probably will not execute immediately because you have exceeded a user run limit.
NQS limits are set on individual queues, queue complexes, and the entire NQS system on a server. A queue complex is a set of local batch queues grouped by the NQE system administrator to simplify NQE administration. Each queue complex can have the same set of limits that a single batch queue can have. global limits restrict the whole NQS system on a host.
Your NQE administrator can set the limits shown in Table 4-1 on individual queues, queue complexes, and on the NQS system as a whole.
Limit | Status subcodes | Description |
---|---|---|
Run | qr cr gr | The number of requests that can execute concurrently (regardless of their owner or group) in a queue (qr), in a complex (cr), or on this NQS server (gr). When a queue reaches its run limit (requests will have the substatus qr), requests are still routed to the queue, but they remain queued until the number of executing requests falls below the limit. |
User | qu cu gu | The maximum number of requests a particular user can execute at the same time in a queue (qu), in a complex (cu), or on this NQS server (gu). |
Group | qg cg gg | The maximum number of requests that users of a particular group can execute at the same time in a queue (qg), in a complex (cg), or on this NQS server (gg). |
Memory | qm cm gm | The maximum total amount of memory available to all requests executing concurrently in a queue (qm), in a complex (cm), or on this NQS server (gm). |
MPP processing elements | qe ce ge | The maximum number of MPP processing elements (PEs) that can be requested by all requests executing concurrently in a queue (qe), in a complex (ce), or on this NQS server (ge). |
Quickfile limit | qq cq gq | The maximum number of secondary data segments that can be requested by all requests executing concurrently in a queue (qq), in a complex (cq), or on this NQS server (gq). |
To display the current limits, use the commands in Table 4-2:
Table 4-2. Commands to Display Limits
Type | Command | Description of display |
---|---|---|
Queue | cqstatl -l or qstat -l | Summary of batch queue limits; information can be obtained only if the command is issued to NQS. For an example of this display, see “Displaying Batch Queue Limits” in Chapter 11. |
Queue | cqstatl -f or qstat -f | Details of all queues; information can be obtained only if the command is issued to NQS. Look at the areas RUN LIMITS and RESOURCE USAGE. For an example of this display, see “Displaying Queue Details” in Chapter 11. |
Complex | qstat -L | Summary of queue complex limits. This command works only when you are logged on interactively to your NQS server. To see a list of queue complexes on the server, use qstat -c; this command works only when you are logged on interactively to your NQS server. |
System | qmgr show global_parameters | Global limits for your NQS server. This command works only when you are logged on interactively to your NQS server. |
If you have long jobs that take several hours or more to execute, you can have the system notify you when the request starts and/or completes execution. By default, the mail messages are sent to the user name under which you were logged in when you submitted the request.
![]() | Note: Mail is sent only by NQS. For requests submitted to the NQE database, any mail requested is sent by the NQS running the copy of the request. If the cluster rerun feature is enabled on your NQE cluster, it is possible that you will receive multiple mail messages if the request is scheduled more than once because the NQE scheduler performs network cluster rerun. |
To specify that you want to receive mail when your request is routed from a pipe queue, when it begins execution, and when it finishes execution, do one of the following:
Use the NQE GUI Submit window and select Mail Options on the Configure menu. Ensure the Mail when Job is Routed, Mail at Start of Job, and Mail at End of Job options are selected, and apply your changes (click on the Apply button) if you need to select any of the options. To save your changes, use the Save Current Job Profile option.
Use the cqsub -mt -mb -me command.
Use the qsub -mt -mb -me command.
If you use these options, mail from NQS would appear as follows in your mail queue (this example is from elm (1)):
N 46 Oct 12 ice_root (27) NQS request: 59.ice delivered. N 47 Oct 12 ice_root (27) NQS request: 59.ice beginning. N 48 Oct 12 ice_root (36) NQS request: 59.ice ended. |
The mail that informs you that your request has been routed from a pipe queue would be similar to the following:
Subject: NQS request: 59.ice delivered. Message concerning NQS request: 59.ice delivered. Request name: job2 Request owner: jane Mail sent at: 09:01:55 CDT Request sent to local queue destination. Job log follows: 10/12 09:01:53 Arrived in <[email protected]> from <ice>. 10/12 09:01:54 Arrived in <[email protected]> from <[email protected]>. 10/12 09:01:55 Sending <delivered> mail to <jane>. |
The mail that informs you that your request has begun execution in a batch queue would be similar to the following:
Subject: NQS request: 59.ice beginning. Message concerning NQS request: 59.ice beginning. Request name: job2 Request owner: jane Mail sent at: 09:01:55 CDT Job log follows: 10/12 09:01:53 Arrived in <[email protected]> from <ice>. 10/12 09:01:54 Arrived in <[email protected]> from <[email protected]>. 10/12 09:01:55 Sending <delivered> mail to <jane>. 10/12 09:01:55 Sending <beginning> mail to <jane>. |
The job log at the end of the message records that both mail messages were sent.
The mail that informs you that your request has completed execution in a batch queue would be similar to the following:
Subject: NQS request: 59.ice ended. Message concerning NQS request: 59.ice ended. Request name: job2 Request owner: jane Mail sent at: 09:02:01 CDT Request exited normally. _Exit() value was: 0. Job log follows: 10/12 09:01:53 Arrived in <[email protected]> from <ice>. 10/12 09:01:54 Arrived in <[email protected]> from <[email protected]>. 10/12 09:01:55 Sending <delivered> mail to <jane>. 10/12 09:01:55 Sending <beginning> mail to <jane>. 10/12 09:01:56 Started, pid=<1688>, jid=<1688>, shell=<>, umask=<18>. 10/12 09:01:56 Running in queue <nqebatch>. 10/12 09:01:59 Finished. 10/12 09:01:59 Returning stderr output file. 10/12 09:02:00 Returning stdout output file. 10/12 09:02:01 Sending <ended> mail to <jane>. |
If the UNICOS MLS feature or UNICOS/mk security enhancements are enabled on your system, you must have your current active label set appropriately to read mail from NQS. If your request has not yet been initiated, mail is sent to you at the job submission label. If your request has been initiated or has completed execution, mail is sent to you at the job execution label.
To request mail when a request is rerun, do one of the following:
Use the NQE GUI Submit window and select Mail Options on the Configure menu. Ensure the Mail when Job is Rerun option is selected, and apply your change (click on the Apply button) if you need to select it. To save your changes, use the Save Current Job Profile option.
Use the cqsub -mr or qsub -mr command.
To request that mail be sent to an alternative user name, do one of the following:
Use the NQE GUI Submit window and select Mail Options on the Configure menu. Ensure the Mail User Name option is selected, and apply your change (click on the Apply button) if you need to select it. To save your changes, use the Save Current Job Profile option.
Use the cqsub -mu username or qsub -mu username command. By default, the mail messages are sent to the user name under which you were logged in when you submitted the request.
For a full list of mail message options available through the command line interface, see the cqsub(1) or qsub(1) man page.
If your request does not complete successfully, you receive either an error message in the standard error file or a mail message that describes the problem.
The following example is the mail you will receive from NQS if your validation is not set up correctly:
Message concerning NQS request: 32.latte deleted. Request name: STDIN Request owner: jane Mail sent at: 17:31:45 CDT Request could not be routed. The job you submitted could not be routed to any destination. Destinations may have been explicitly requested by you, chosen for you by static pipe queue destinations, or chosen dynamically by the NQE network load balancer. The most typical cause for this kind of problem is account validation. Check that you have a valid account on the target NQS system(s), and that you have an entry there in your .rhosts and/or .nqshosts files. See the nqe man page and User Guide for additional information. System administrators - Check that NQS MID's (mainframe identifiers) are correctly set up, so that client and server NQS systems know about each other. Also, try the nlbpolicy command to look at all of the possible target hosts for an nqs job, given current system load and NQE policies. And in the nqs logfile, message nqs-527 also shows what specific destinations were tried. Request deleted. Last destination queue attempted: <[email protected]> Job log follows: . . . |
This message indicates that the problem probably will be a validation file error. The job log at the end of the message, however, may indicate a different error. You should try to interpret it before calling your local support personnel.
Your batch request may have to access a data file on the system. For example, you may want to compile a source program file or process a file of data. When a request begins executing, the current directory is the home directory of the user under whom the request is being executed (just as if that user had logged in interactively).
For example, consider the following request, which is a standard shell script that compiles and then executes a Fortran program called loop.f:
set -x # Echo commands ja # Enable job accounting date cd loopdir # Move to directory where # loop.f is stored f90 -Zu loop.f # Compile program loop.f date./a.out # Execute program a.out date rm loop.o a.out # Remove files echo job complete ja -csft # Report job accounting # information and disable job # accounting |
The following two lines are taken from the preceding example. The first line changes the request's current directory to the directory in which loop.f is stored; the second line compiles loop.f.
cd loopdir f90 -Zu loop.f |
Another method of accessing a data file is to embed it in the shell script as a here document. A here document has the following form:
command << 'string' lines of input string |
The lines of input could be commands or data supplied as standard input to command. The string line defines the end of the lines of input.
The following example shows the compilation of a Fortran program that is created as a here document, and then the subsequent executing of the program by using data that also comes from a here document:
cat > test.f << 'DELIMIT' PROGRAM ADD C Read data file to get number of data points to read C into an array; add them up and write out the sum DIMENSION A(100) REAL A, SUM INTEGER N OPEN (5, FILE= 'DATA') READ (5,1) N 1 FORMAT(I3) DO 10, I=1,N READ (5,2) A(I) 2 FORMAT (F 10.5) 10 CONTINUE SUM=0.0 DO 20, I=1,N SUM=SUM + A(I) 20 CONTINUE WRITE(6,3) N, SUM 3 FORMAT (' sum of the ',I3,' numbers is ', G15.5) END DELIMIT cat > DATA << '*EOF*' 5 10.0 20.0 30.0 40.0 50.0 f90 test.f./a.out |
In the next example, the prog program is executed twice, using two different sets of data:
prog << 'EOT' 2.2 8.9 EOT prog << 'EOT' 8.6 123.4 EOT |
If you are running UNICOS or UNICOS/mk, and your system or user profile does not automatically enable job accounting, you can include ja(1) commands within your batch request. The ja command provides information about the resources used when your request executes. The ja command can take several options and arguments, such as the name of a file to contain the report (the default is $TMPDIR/.jacctxxxx; xxxx is the UNICOS or UNICOS/mk job identifier). For a full description of the ja command, see the ja(1) man page.
In the following example script, the first ja command simply turns on job accounting, writing the information to the default file. The second ja command (with the -s option) writes a report to the standard output file when the script file is executed. When the request completes execution, the contents of the temporary directory $TMPDIR (including the .jacctxxxx file) are deleted automatically.
ja sleep 45 date pwd ls ja -s |
To view the job accounting report, you can examine the standard output file produced by the execution of the request. For more information about a request's output files, see Chapter 6, “Working with Output Files”. The following output file shows the report produced by running the preceding script. (If the request ran under the C shell, a message written to the start of the output file indicates that the normal C shell job control facilities were not available for this job because it was run in batch mode, rather than from a terminal (tty).)
Thu Feb 19 16:09:50 CST 1998 /u/snow testjob testjob.e11562 testjob.o11562 tutorial Job Accounting - Summary Report =============================== Job Accounting File Name : /tmp/jtmp.004076a/.jacct1093 Operating System : sn1234 xyz 9.2.1bm abc.1 CRAY C90 User Name (ID) : snow (10334) Group Name (ID) : crazy (100) Account Name (ID) : snow (10334) Job ID : 1093 Report Starts : 02/19/98 16:09:04 Report Ends : 02/19/98 16:09:50 Elapsed Time : 46 Seconds User CPU Time : 0.2327 Seconds System CPU Time : 0.0791 Seconds I/O Wait Time (Locked) : 0.0391 Seconds I/O Wait Time (Unlocked) : 0.1485 Seconds CPU Time Memory Integral : 0.0223 Mword-seconds SDS Time Memory Integral : 0.0000 Mword-seconds I/O Wait Time Memory Integral : 0.0027 Mword-seconds Data Transferred : 0.1786 MWords Maximum memory used : 0.2051 MWords Logical I/O Requests : 87 Physical I/O Requests : 108 Number of Commands : 6 Billing Units : 0.4781 logout |
If you are running UNICOS or UNICOS/mk, you may receive an NQS error message while you are using NQS. The NQS user error messages have a group code of nqs. To display the explanation of a message, use the explain(1) command. For more information, see the explain(1) man page. The following example shows NQS error messages and the use of the explain(1) command:
% qmsg -j -f msg_in 10 nqs-272 qmsg: WARNING Error opening file <abc/u0/snow/msg_in>, errno = <13>. nqs-20 qmsg: WARNING Errno <13> = <Permission denied>. % explain nqs272 Error opening file <file>, errno = <errno>. NQS tried to use the open(2) system function to open file <file>, but failed. The reason for the failure is identified by <errno>. % explain nqs20 Errno <errno> = <error_desc>. The specified error <errno> has an associated description of <error_desc>. |
![]() | Note:: This functionality is available only on UNICOS, UNICOS/mk, and IRIX systems. |
If the operating system is shut down or crashes before the request completes execution, you do not necessarily have to resubmit a batch request because NQS has job recovery capabilities. NQS uses the operating system checkpoint facilities, as follows:
On UNICOS and UNICOS/mk systems, the chkpnt(2) and restart(2) facilities are used. On IRIX systems, the cpr(2) facility is used.
When the operating system or NQS is shut down, checkpoint images of all executing requests can be written automatically to a restart file on disk. When the system becomes available, NQS uses the checkpoint image to try to restart each of the requests from the point they had reached in their execution.
When the operating system or NQS crashes, checkpoint images cannot be written for the executing requests. However, you can include the qchkpnt(1) command within a request to cause NQS to write a checkpoint image of the request at particular points in its execution. When the system becomes available after a crash, NQS tries to restart the request from the latest checkpoint image.
If a request has not yet begun execution at the time of the shutdown or crash, or if no checkpoint image is available, the request remains in its NQS queue and is executed from the start after the system becomes available.
The following two sections describe how you can ensure that your requests take advantage of these recovery facilities, or how to ensure that your request does not use the facilities.
A checkpoint image of a request is a copy of the request as it was executing at a particular point in time. A request can be restarted from its checkpoint image, rather than restarting the request from the beginning. If the MLS feature on UNICOS is enabled on your system, however, a checkpointed job cannot be restarted if the job execution label is no longer in the valid range for the job owner and the host of origin.
Checkpoint images can be produced in the following ways:
NQS automatically checkpoints NQS requests when an operator brings down NQS or the operating system in an orderly manner.
To protect against unscheduled interrupts, you can use the qchkpnt(1) command within a request to write an image of the request to disk when it is executing. This action is particularly appropriate if your request will execute a long time and you are concerned about request recovery across unscheduled interrupts (such as a power outage or a system crash).
In either case, when NQS resumes normal operation, it automatically uses the latest checkpoint image to restart the request if the criteria listed in “Criteria for Batch Request Recovery”, are met.
To prevent a checkpoint image being written in either of the preceding situations, use the -nc option of the cqsub or qsub command, or select the NQE GUI Submit -> Configure -> General Options -> Do not checkpoint option.
To restart the request from a checkpoint image, even if the files have been modified since the checkpoint image was created, use the -Rf option of the cqsub or qsub command.
To force a checkpoint image of a request to be made, include the qchkpnt command within the request at suitable places. When the request is executing, a checkpoint image of the request is written whenever the qchkpnt command is executed.
The command has no options; the syntax is as follows:
qchkpnt |
You can use the qchkpnt command only from within a batch request to checkpoint that request.
The following example shows how you can use the qchkpnt command in a request:
ja # Enable job accounting date cd loopdir # Move to directory where # loop.f is stored f90 -Zu loop.f # Compile program loop.f qchkpnt date./a.out # Execute program a.out date rm loop.o a.out # Remove files echo job complete ja -csft # Report job accounting # information and disable # job accounting |
If an unscheduled interrupt in the NQS system occurs after the compilation of the program, the request is restarted automatically by using the checkpoint image at the line following the qchkpnt command. If the unscheduled interrupt occurs during or before the compilation, the request is restarted from the beginning.
If the system administrator shuts down the NQS system in an orderly way when the request is executing, a checkpoint image is written automatically at that time, and it is this image that is used for restarting the request, rather than the one written by qchkpnt.
Before NQS can create and use a checkpoint image to restart a request automatically, the request must meet the checkpoint criteria established by the system checkpoint facility.
For more information on the UNICOS and UNICOS/mk checkpoint facilities, see the chkpnt(2) and restart(2) man pages and General UNICOS System Administration, publication SG-2301.
For more information on the IRIX checkpoint facility, see the cpr(1) man page and the Checkpoint and Restart Operation Guide, Silicon Graphics publication 007-3236-xxx.
If these criteria are not met, in some cases, NQS reruns the request from the start; a mail message is sent to you about the problem. In other cases (for example, a process identifier or job identifier of the request has been reused), NQS puts the request in a WAIT state and tries the restart at short intervals for a period defined by the NQS administrator; in such cases, no mail messages are sent. If the restart still fails, the request is rerun from the start.
If a request is terminated through receipt of a SIGRPE, a SIGUME, or a SIGPEFAILURE signal, NQS will requeue the request instead of deleting it, if one of the following attributes applies to your request:
The request is rerunnable.
The request is restartable and has a restart file.
By default, each NQS request is both rerunnable and restartable. These defaults can be changed by using the cqsub -nc or qsub -nc command option, the cqsub -nr or qsub -nr command option, or the qalter -r n and qalter -c n command options. The owner of the request can specify the cqsub(1) or qsub(1) command options and use the qalter(1) command to modify the request rerun and/or restart attributes. An NQS administrator can also use the qalter(1) command to modify any request's rerun and/or restart attributes.
If NQS requeues your request because it was terminated by either the SIGRPE, SIGUME, or SIGPEFAILURE signal, one of the following messages is written into the system log, the NQS log file, and the job log file:
Request <1.subzero>: Request received SIGRPE or SIGUME signal; request requeued. |
Request <1.subzero>: Request received SIGPEFAILURE signal; request requeued. |
A requeued request is reinitiated after it is selected by the NQS scheduler. An NQS administrator can use the qmgr schedule request now command to force the request to be reinitiated immediately.
To force NQS to restart a request from the beginning rather than from a checkpoint image, submit the request by using the cqsub -nc or qsub -nc command option, or select the NQE GUI Submit -> Configure -> General Options -> Do not checkpoint option. This option prevents NQS from taking a checkpoint image if the administrator shuts down NQS; it also prevents any qchkpnt commands within the request from writing any checkpoint images.
You may want to use the -nc option, for example, if you know that your request has some inherent limitation that prevents job recovery. For a list of possible limitations, see “Criteria for Batch Request Recovery”. Alternatively, you may want to use the -nc option to ensure that your request starts and runs to completion with no interrupts.
If you want to prevent NQS from rerunning from the beginning a request that had already started execution when NQS was interrupted, you must include the -nr option of the cqsub or qsub command, or select the NQE GUI Submit -> Configure -> General Options -> Do not rerun option when the request is submitted. If this option is used and if either the request could not be checkpointed (for example, at shutdown time) or could not be restarted (for example, when the system is rebooted), the request is terminated, thereby preserving any partial output generated by the request.
You also can use the -nr option to preserve output from a job when an unscheduled interrupt occurs, rather than have the job restart from the beginning.
If a request was not executing when the NQS system was interrupted (either in a scheduled way by the system administrator or because of a system crash), the request is retained in its queue and is automatically available for execution when NQS becomes available.
A temporary directory is available to batch requests for the duration of their execution. Each batch request has its own unique temporary directory. When the batch request completes execution, or aborts with an error, NQS deletes the directory, along with any files contained in the directory.
Typically, temporary directories are created within the /tmp directory, although this depends on your site configuration. The name takes the form nqs.xxxxx (xxxxx is any unique string). The name of the directory is available to a batch request in the TMPDIR environment variable.
You can use the temporary directory to hold intermediate files generated by your request that are not required when the request ends.
In the following example script file, the source program and executable file created by the cc compilation line are written to $TMPDIR. After the request completes execution, these files are deleted automatically, along with the temporary directory.
cd $TMPDIR pwd cat <<EOF > test.c main () { . program . } EOF cc -o test test.c ls -la ./test |