Appendix B. Command Line Interface Tutorial

This tutorial introduces you to the NQS commands that are used to submit, monitor, and control an NQS batch request. Before using this tutorial, read through Chapter 1, “Seeing the Big Picture”.

The tutorial covers the following basic tasks:

You can find a complete description of the tasks you can perform in the other chapters of this publication.

You may find it helpful to read this tutorial at your workstation because it contains several exercises that you can perform. For a list of the commands and options in the order in which they appear in this tutorial, see “Summary”.

Before you can submit batch requests, you must ensure that you have a valid user name and password to enable you to log in to the operating system.


Note: The exercises in this tutorial create several files in your current directory. You can create a new directory from which to perform the exercises to make it easier to remove the unwanted files at the end of the tutorial. A list of the files created is given at the end of the tutorial in “Removing Files from Your Directory after the Tutorial”.


Creating the Batch Request

Before you can submit a batch request, you must first create a script file that contains the commands that make up the request. You can create this file by using any text editor (such as vi(1)).

A batch request can be one command, such as the following example:

who           # list users of the CRI system

Usually, however, it contains several commands. For example, the following batch request compiles and runs a CF90 Fortran program:

set -x             # Echo commands into standard output
ja                 # Enable job accounting
date               # Print current date and time
f90 loop.f         # Compile a program called loop.f
segldr loop.o      # Load replaceable loop.o as  a.out
date               # Print date and time
./a.out            # Execute the program a.out
date               # Print date and time
rm loop.o a.out    # Remove files
echo job complete
ja -csft           # Report job accounting information
                   # and disable job accounting

If you enter the required commands interactively, line-by-line, from standard input (stdin), you do not have to create a file. This is done by issuing a cqsub or qsub command without a batch request file name.

% qsub
ls
who
CONTROL-d
nqs-181 qsub: INFO
  Request <365.coal>: Submitted to queue <express> by <snow(123)>.
%

When this form of cqsub or qsub is used, the request is given the name STDIN, which is used to form the names of the standard output and standard error files produced by the request. See “Examining Output from a Batch Request”.

However, if you make an error halfway through entering a request and do not find the error until after you have started the next line, you cannot return and correct the error. You must press INTERRUPT (usually CONTROL-c) to terminate the cqsub or qsub command and start again.

Exercise 1

Log on interactively to your system and create a file called testjob that contains the following lines:

date
pwd
ls

Save this file, because it is used in exercises later in this tutorial. (The commands in this file simply display the current date and time, display the path to the current directory, and list the contents of the current directory.)

The shell used to execute NQS batch requests may not be the same as your login shell; therefore, the results of batch request execution may differ from interactive execution of the commands in the request file. See “Discovering the Shell to Be Used for Your Requests”.

Submitting a Batch Request for Execution

To submit a batch request for execution, use the cqsub or qsub command, which has many options. For a complete list of the options, see the cqsub(1) or qsub(1) man page. A simplified form of the cqsub and qsub commands are as follows:

cqsub [-q queue] file

qsub [
-q queue] file

The file argument is the name of the script file that will be submitted for execution.

The -q queue option indicates the name of an NQS queue to which the request will be initially sent (it does not apply to requests sent to the NQE database). The NQS administrator may define several NQS queues that are available to users; the definition includes a list of one or more destination queues to be associated with each NQS queue. Usually, you would submit a request to a pipe queue, which would then route your request to a suitable batch queue for execution (although some systems may let you submit a request directly to a batch queue). To display a list of the NQS queues, you can use the cqstatl(1) or qstat(1) command.

To submit a file called testjob to an NQS queue called std, you can use the following command line:

% qsub -q std testjob

You will receive one of the following messages:

nqs-181 qsub: INFO
  Request <366.coal>: Submitted to queue <std> by <snow(123)>.

nqs_4517 qsub: CAUTION
  No such queue <std> at local host.

If you omit the -q option, the batch request is sent to the default NQS batch request queue (if one has been defined).

In the following example, the default queue is called express:

% qsub testjob
nqs-181 qsub: INFO
  Request <367.coal>: Submitted to queue <express> by <snow(123)>.
%

You can specify the default queue in the QSUB_QUEUE environment variable. If you have not defined that variable, the request is sent to a system default queue (if the NQS administrator has defined such a queue). In some configurations, the administrator may not have defined any system default queue; therefore, you must specify a queue by using the -q option.

You can display the name of the system default queue as defined by the NQS administrator only by using the qmgr(8) command. Many of the qmgr subcommands can be executed only by the NQS administrator, but any user can use the show parameters subcommand to display the default batch request queue.

In the following example, which is a partial display, the default batch request queue has the name batch.

% qmgr
Qmgr: show parameters
show parameters

  Checkpoint directory = /usr/spool/nqs/private/root/chkpnt
  Debug level = 0
  Default batch_request queue = batch
  Default destination_retry time = 72 hours
  Default destination_retry wait = 5 minutes
  Default return of a request's job log is OFF
  Global batch group run-limit: 40
  Global batch run-limit:      1
  Global batch user run-limit: 40
  Global MPP Barrier limit:    unspecified
  Global MPP Processor Element limit:    unspecified
  Global memory limit:         unlimited
  Global pipe limit:          20
  Global quick-file limit:     543227904w
  Global tape-drive a limit:   unlimited
  Global tape-drive b limit:   16
  Global tape-drive c limit:   16
  Global tape-drive d limit:   16
  Global tape-drive e limit:   16
  Global tape-drive f limit:   16
  Global tape-drive g limit:   16
  Global tape-drive h limit:   16
Press  to continue.

Discovering the Shell to Be Used for Your Requests

The shell used to execute your NQS batch requests may not be the same as your login shell. When you submit a batch request to NQS, you can specify the shell to be used for that request by using the cqsub -s or qsub -s command. You can specify the csh, ksh, or sh shell names. The -s option overrides any shell strategy that is configured on the execution machine. When NQS initiates the request, it spawns a second shell. If your login shell is sh or ksh, the request is run under that shell. If your login shell is csh and the request will run under csh, you must include the #!/bin/csh command as the first line of the shell script itself (before any #QSUB directives). If you do not use this command, NQS will run the batch request under sh. This is normal csh behavior, and it is described in the csh(1) man page.

To ensure that a batch request always uses a certain shell, embed the following command line in the script file:

#QSUB -s shell_name

The shell_name argument is the full path name of the shell you want to use to interpret the batch request shell script.

For a description of the passing of environment variables to a batch request, see “Environment Variables Automatically Set” in Chapter 9. You also can set individual shell variables within a batch request.

Exercise 2

This exercise uses a script file called testjob that contains the following lines:

date
pwd
ls

Before proceeding with the exercise, you should check that this file exists and has the preceding contents.

Next, find the name of an NQS pipe queue to which you can submit a request by issuing the cqstatl or qstat command with the -p option, as shown in the following example (this option does not apply if you are sending the request to the NQE database). The pipe queue names are listed under the column QUEUE NAME.

% qstat -p
-----------------------
NQS PIPE QUEUE SUMMARY
-----------------------
QUEUE NAME              LIM TOT ENA STS QUE ROU  WAI HLD ARR DEP  DESTINATIONS
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------
express                    2   1 yes  on   0   1    0   0   0   0 b_600_96
                                                                  b_1800_96
                                                                  b_3600_96
                                                                  b_14400_96
                                                                  b_86400_96
                                                                  b_max_96
big                       1   0 yes  on   0   0    0   0   0   0  b_large
cray02                    1   0 yes  off  0   0    0   0   0   0  [email protected]
cray03                    1   0 yes  off  0   0    0   0   0   0  [email protected]
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------
coal                      5   1           0   1    0   0   0   0
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------

The display produced by this command shows all of the NQS pipe queues that are available and the destinations of each queue. If the display is so big that part of it scrolls off the screen, you can use the more(1) command to page the display, as follows:

qstat -p | more

If you are unsure which queue to use for your requests, ask your system administrator. Whichever queue you choose to use, check that the entry in the ENA column is yes and that the entry in the STS column is on. If you see some other value for the queue, ask your administrator to enable and start the queue.

To define the queue name you want to use as your default queue, set the QSUB_QUEUE environment variable, depending on the type of login shell that you are currently using, as follows.

For C shell users:

setenv QSUB_QUEUE queue-name

For standard or Korn shell users:

QSUB_QUEUE=queue-name
export QSUB_QUEUE

The following example shows how to set the default queue to big when you are using the C shell:

% setenv QSUB_QUEUE big
%

Now send a batch request for the script file you created in exercise 1 without specifying a queue name so that it is sent to the default queue.

% qsub testjob
nqs-181 qsub: INFO
  Request <368.coal>: Submitted to queue <big> by <snow(123)>.
%

To unset an environment variable, you can use the unsetenv command (see setenv(3)) if you are using the C shell, or the unset command if you are using the standard or Korn shell.

Confirmation of a Successful Submission

If your batch request is submitted successfully, you will receive a message that displays an identifier for the request. You can use this identifier in other commands, for example, to monitor the progress of the request or to delete the request. The message also contains the name of the queue in which the request was initially placed.

For example, when you complete the preceding exercise, you should get an NQS message similar to the following:

nqs-181 qsub: INFO
  Request <368.coal>: Submitted to queue <express> by <snow(123)>.

The identifier for this request is 368.coal, which indicates that the request has a sequence number of 368 and was submitted to an NQS server with host name coal. In this example, the request is initially sent by user snow to a queue called express.

To display the current status of your submitted requests, use the cqstatl -a or qstat -a command. For more details about the cqstatl or qstat command, see the cqstatl(1) or qstat(1) man page.

Exercise 3

Record the request identifier displayed when you performed exercise 2 (for use in subsequent exercises). Check that the displayed queue is the same as the default queue you set in the QSUB_QUEUE environment variable.

Examining Output from a Batch Request

After a batch request is executed, the standard output, standard error, and job log files produced by the request are written. The standard output file contains the messages that would have been displayed if you had issued the commands contained in the batch request in an interactive session. The standard error file contains any error messages that are produced during the execution of the script. The job log file contains explanatory messages for request-related events.

By default, these files are written to the directory you were in when you issued the cqsub or qsub command and are given the following names:

Name 

Description

testjob.e number 

The standard error file produced by the batch request

testjob.o number 

The standard output file produced by the batch request

testjob.l number 

The job log file produced by the batch request

The number argument is the sequence number as displayed when you submitted the request. If you submit requests that have file names greater than 7 characters, only the first 7 characters are used in forming the output file names.

To merge the messages that would be sent to the standard error file into the standard output file, use the qsub -eo or qsub -eo command.

The command lines that are executed are not written to the standard output or error files unless you include an appropriate UNIX set command at the start of the batch request (see “Exercise 5”).

Exercise 4

If you submitted the batch request as detailed in exercise 2, after a short time (the exact time depends on the amount of other work being done on the system, but it could be just a few seconds), the batch request completes execution and the following files are placed in the directory that you were in when you issued the cqsub or qsub command:

testjob.enumber
testjob.onumber
testjob.lnumber

Check your current directory for these files, and check that number is the same as the sequence number displayed when you first submitted the batch request.

Examine the contents of the standard output file (testjob.o number) by using a command such as more(1). The file should contain output similar to that shown in the following screen:

% more testjob.o368

             Cray Research, Inc. CRAY C90 UNICOS - abc

                           UNICOS 9.0.10bm
                        CRAY C9016E/16256-4
                              1gw SSD
                              Model-E
                       4.167 nanosecond clock
               16 - cpus 256 megawords of real memory
            and a Kernel with 32 bit full memory address

Tue Jan 6 05:07:47 CST 1998
/sub/snow
fred
anoldfile
temp1
testjob
% 

The pwd and ls commands indicate that the request executed with your home directory as its current directory, even if you submitted the request from a directory other than your home directory.

The UNIX command lines that the batch request executed are not echoed to the standard output file.

Exercise 5

This exercise uses a script file called testjob that contains the following lines:

date
pwd
ls

Before proceeding with the exercise, you should check that this file exists and has the preceding contents.

To echo the command lines in the script to the standard error file as they are executed, edit the testjob file so that it includes one of the following lines at the start (depending on the shell under which the script will execute).

To discover which shell your requests will use, see “Discovering the Shell to Be Used for Your Requests”.

For the C shell:

set echo

For the standard or Korn shell:

set -x

Save the changes and then resubmit the request by using the -eo option as follows, specifying that any standard error messages produced by the request will be placed in the standard output file:

% qsub -eo testjob
nqs-181 qsub: INFO
  Request <369.coal>: Submitted to queue <express> by <snow(123)>.
%

When the request executes, a new standard output file is created in your current directory; its file name includes the sequence number for this request. No separate standard error file is created because the -eo option was used. Because of the set command you included in the script file, the output file contains the actual UNIX commands executed, intermingled with the output.

Examine the contents of the standard output file (testjob.o number) by using a command such as more(1). The file should contain output similar to that shown in the following screen:

% more testjob.o369

Warning: no access to tty; thus no job control in this shell...
             Cray Research, Inc. CRAY C90 UNICOS - abc

                           UNICOS 9.0.10bm
                        CRAY C9016E/16256-4
                              1gw SSD
                              Model-E
                       4.167 nanosecond clock
               16 - cpus 256 megawords of real memory
            and a Kernel with 32 bit full memory address

+ date
Tue Jan 6 05:28:14 CST 1998
+ pwd
/sub/snow
+ ls
fred
anoldfile
temp1
testjob
%

When a request has executed under the C shell, NQS always displays the message that begins Warning:no access to tty; at the start of the standard output file. The message does not indicate that an error has occurred; it is simply a warning that the usual C shell job control options are not available because this is a batch request. job control is a means of controlling multiple shells or processes interactively, and is not available with a batch job because no interactive session (referred to as tty in the message) is associated with the job.

Specifying Resource Limitations for a Batch Request

When you submit a batch request to NQS, you can include options to specify the system resources that the request requires. If you do not specify any options, resource restrictions are assigned based on the user database (UDB) limits and the limits set by the system administrator for the batch queue in which the request executes.

You can specify many different limits to cqsub and qsub (see “Using Resource Limits” in Chapter 5, for a complete list of limits). Some of the more common options are as follows:

Option

Description

-a date

The earliest time at which the request can be executed.

-lT time

The maximum number of CPU seconds the job can use when executing at the UNICOS system.

-lM size

The maximum amount of memory the request is allowed to use when it executes. The size is in units of bytes unless it is suffixed by some other unit (such as Kb or Mb); see the cqsub(1) or qsub(1) man page.

You can specify these options on the cqsub or qsub command line.

The following command submits a batch request called testjob that has the following characteristics:

  • The request is sent to the default NQS queue

  • The request cannot be executed before 11 P.M. on the first Friday following the day of submission

  • The request requires a maximum of 5 CPU seconds at the UNICOS system

    qsub -a "23:00 Friday" -lT 5 testjob

The batch request file name is always the last item on the cqsub or qsub command line following any options.


Note: When specifying resource limits, be careful not to specify more resources than the request will use, because these limits are used to schedule work. Requests that have large resource limits (for example, a large CPU time or memory) are generally executed less often than those with smaller resource limits. Therefore, the larger the limits you specify, the longer it may take to complete your work.

After a request has been submitted, you cannot change the resource limits specified for the request. However, the system administrator can change the limits by using the qmgr(8) command.

Exercise 6

This exercise uses a script file called testjob that contains the following lines:

set -x (standard or Korn shell only)
set echo (C shell only)
date
pwd
ls

Before proceeding with the exercise, you should check that this file exists and has the preceding contents.

Submit a batch request specifying that it will be executed 2 minutes later than the current time.

If it is now 9:45 A.M., you can enter the following command:

% qsub -a "9:47" testjob
nqs-181 qsub: INFO
  Request <370.coal>: Submitted to queue <express> by <snow(123)>.
%

The -a option can take several different formats of date and/or time. For this exercise, the format for the time used is hours:minutes (using a 24-hour clock).

If you do not specify a specific day, the request executes the next time the specified time occurs (that is, today if the specified time is later than the time the command was issued, or tomorrow if the specified time is earlier than the time the command was issued).

The -a option does not delay the entering of a request into the initial queue, or its routing through to the destination batch queue. This routing occurs as normal. However, after the request reaches the final destination queue, it waits in that queue until the time and date specified by the -a option.

You should not see in your current directory any of the output files produced by this request for at least 2 minutes.

To display the status of the request, use the cqstatl -a or qstat -a command; the letter W in the column headed ST indicates that the request is waiting until the time specified by the -a option, as follows:

sn1633% qstat -a
--------------------------
NQS BATCH REQUEST SUMMARY
--------------------------
IDENTIFIER    NAME    USER     QUEUE                 JID  PRTY REQMEM REQTIM ST
------------- ------- -------- --------------------- ---- ---- ------ ------ ---
370.coal      testjob snow     [email protected]               ---  5120       10 W
no pipe queue entries
no device queue entries
sn1633% 

NQS can notify you when a request starts execution and stops execution by using the cqsub or qsub command options -mb and -me. See “Using NQS Mail” in Chapter 4.

Specifying Options within the Script File

An alternative method of specifying cqsub or qsub options is to supply them at the start of the batch request file. The options must come before the first executable line of the script (that is, before any lines that are not comments). Each cqsub or qsub option must be on a separate line and each line must be prefixed by the string #QSUB.

The following modified version of the shell script testjob specifies identical executing conditions to those in the cqsub or qsub line shown in “Specifying Resource Limitations for a Batch Request”:

#QSUB -a "23:00 Friday"
#QSUB -lT 5
date
pwd
ls

If you specify an option both in the script file and on the cqsub or qsub command line, the option on the command line is used. It is useful to specify in the script file the conditions that do not change (such as the CPU time limit for the request), and to specify on the cqsub or qsub command line the options that may change with every submission of the script file (such as the time at which the request will be executed).

Exercise 7

This exercise uses a script file called testjob that contains the following lines:

set -x (standard or Korn shell only)
set echo (C shell only)
date
pwd
ls

Before proceeding with the exercise, you should check that this file exists and has the preceding contents.

Submit a batch request that will not be executed until 3 minutes later than the current time, but specify this time limitation within the script file. To do this, edit the script file and include an appropriate #QSUB line.

If it is now 11:59 A.M., you would edit the script file to include the following line as the first line in the file:

#QSUB -a "12:02"

To display the status of the request, use the cqstatl -a or qstat -a command; the letter W in the column headed ST indicates that the request is waiting until the time specified by the -a option, as follows:

sn1633% qstat -a
------------------------------
NQS 3.1 BATCH REQUEST SUMMARY
------------------------------
IDENTIFIER    NAME    USER     QUEUE                 JID  PRTY REQMEM REQTIM ST
------------- ------- -------- --------------------- ---- ---- ------ ------ ---
370.coal      testjob snow     [email protected]               ---  5120       10 W
no pipe queue entries
no device queue entries
sn1633% 

NQS can notify you when a request starts execution and stops execution by using the cqsub or qsub command options -mb and -me. See “Using NQS Mail” in Chapter 4.

Sending a Message to an Executing Request

Occasionally, you may want to write a message to the output files produced by a request. For example, you may want to send a message that mentions that you are about to abort the request.

To write a message to a request's output files, use the qmsg(1) command. The request must be executing for the command to succeed because the output files are not created until the request begins to execute.

Using options to qmsg, you can specify whether the message is written to the standard output file or to the standard error file produced by the request.

% qsub sleeper
nqs-181 qsub: INFO
Request <1109.coal>: Submitted to queue <express> by <snow(123)>.
% qmsg 1109.coal
Hello request 1109.
I am about to abort you.
CONTROL-d
%

After entering the qmsg command line, you can enter lines of text, terminating each line by pressing RETURN. When you have entered all the lines, press CONTROL-d.

The message is written to the standard error file of the request, unless you use the qmsg -o command to write the message to the standard output file.

Exercise 8

Edit the testjob script to do the following:

  • Remove the #QSUB line at the start of the script if you inserted this line in exercise 7

  • Include the following line at the end of the script:

    sleep 120

    This line keeps the request executing for a couple of minutes to give you time to write a message to it.

The testjob script file should then contain the following lines:

set -x (standard or Korn shell only)
set echo (C shell only)
date
pwd
ls
sleep 120

Before proceeding with the exercise, you should check that the script has the preceding contents.

Submit the request by using the cqsub or qsub command. After a few seconds, try using the qmsg command to write a message to the request's standard error file.

% qsub testjob
nqs-181 qsub: INFO
  Request <1110.coal>: Submitted to queue <express> by <snow(123)>.
% qmsg 1110.coal
This is a test message from qmsg.
CONTROL-d
%

After you successfully enter the qmsg command, wait for about 2 minutes to allow the request to complete execution and then examine the standard error file of the request. This file should be returned in your current directory and have the name testjob.exxx; xxx is the number of the request as displayed when you submitted it. The file should contain the message you sent with qmsg at the end of the file.

If you receive the following message when you enter the qmsg command, your request has not yet started to execute:

Request's stderr file does not exist

Wait a few seconds and try again.

If you get the following message when you enter the qmsg command, the request has completed execution and therefore, you cannot write to the request's output files:

Request request-number does not exist

Resubmit the request and, after a few seconds, try using the qmsg command.

Submitting a Request to a Remote Host

Sometimes you may want to execute a request at another system other than the one to which you are logged on. This can be done by submitting your request to a local pipe queue that has a queue at a remote system as its destination. For example, the screen shown in “Exercise 2”, shows a local pipe queue called cray02 that has the destination [email protected]. This destination refers to a queue called batch at a remote system called cray02. For further information, see “Submitting Batch Requests” in Chapter 4.

To execute requests at a remote NQS host, you may have to create some validation files, depending on how the NQS system has been configured. For more information, see “NQS Validation Requirements” in Chapter 2. If you try to execute a request at a remote host, and you do not have the correct validation files, NQS sends a mail message to inform you of the problem.

When you enter the cqstatl -p or qstat -p command in “Exercise 10”, you can display pipe queues on your local system that have remote destinations.

Monitoring NQS

The cqstatl or qstat command lets you monitor your batch requests and the status of the NQS queues. This section introduces you to the following commonly used options of cqstatl and qstat:

  • Checking the status of your batch requests

  • Checking the status of NQS pipe queues

  • Checking the status of NQS batch queues

Checking the Status of Your Batch Requests

After your request has been submitted, you can check its status by using the cqstatl or qstat command. You can display information only about requests that you have submitted. You cannot display information about requests submitted by other users unless you are an NQE administrator.

To display a summary of all your requests (irrespective of which type of queue they are currently in), use the qstat -a or qstat -a command.

The following example shows a request in an NQS pipe queue:

% qstat -a
no batch queue entries
-------------------------
NQS PIPE REQUEST SUMMARY
-------------------------
IDENTIFIER    NAME    OWNER    USER     QUEUE                 PRTY ST
------------- ------- -------- -------- --------------------- ---- ---
372.coal      testjob 1889     snow     [email protected]          31   Q
no device queue entries
%

Unless the system is heavily loaded, your batch requests are transferred to a batch queue within a few seconds.

The following display shows the information displayed when your request is in an NQS batch queue:

% qstat -a
--------------------------
NQS BATCH REQUEST SUMMARY
--------------------------
IDENTIFIER    NAME    USER     QUEUE                 JID  PRTY REQMEM REQTIM ST
------------- ------- -------- --------------------- ---- ---- ------ ------ ---
372.coal      testjob snow     [email protected]          155   30   1024     30 Qgr
no pipe queue entries
no device queue entries
%

One of the most important columns in these displays is the ST column, which gives a brief description of the current status of your request. The first letter displayed in this column can be one of the following:

Status

Description

A

Arriving.

C

Checkpointed. (Available only on UNICOS, UNICOS/mk, and IRIX systems.)

D

Departing.

E

Exiting.

H

Held.

Q

Queued. (The request has not yet begun execution because the limit for the maximum possible number of executing requests has been reached.)

R

Running.

S

Suspended.

U

Unknown state.

W

Waiting for the date/time specified by the cqsub -a or qsub -a command.

The other letters in the ST column provide more detailed information about the status of the request. For a full description of these additional characters, see the cqstatl(1) or qstat(1) man page.

Exercise 9

Edit the testjob script file to remove the last line (sleep 120). The file should then contain the following lines:

set -x (standard or Korn shell only)
set echo (C shell only)
date
pwd
ls

Before proceeding with the exercise, you should check that the script has the preceding contents.

Submit the testjob script and use the cqstatl -a or qstat -a command to check its status. Usually, a request remains in a pipe queue for only a short period of time.

To see the request in an NQS pipe queue, you must enter the cqstatl or qstat command immediately after the request is submitted, or you could include the cqstatl or qstat command on the cqsub or qsub command line, as follows:

% qsub testjob; qstat -a
nqs-181 qsub: INFO
  Request <373.coal>: Submitted to queue <express> by <snow(123)>.
no batch queue entries
-------------------------
NQS PIPE REQUEST SUMMARY
-------------------------
IDENTIFIER    NAME    OWNER    USER     QUEUE                 PRTY ST
------------- ------- -------- -------- --------------------- ---- ---
373.coal      testjob 1889     snow     [email protected]          31   Q
no device queue entries
%

After several (fewer than 10) seconds, try reissuing the cqstatl or qstat command to see whether the request has been transferred to a batch queue. If your system is lightly loaded, the request may have completed execution before you enter the cqstatl or qstat command line, and therefore, you see the following display:

% qstat -a
no batch queue entries
no pipe queue entries
%

If this occurs, you can ensure that the request stays in a batch queue long enough to monitor it by using the cqsub -a or qsub -a command to indicate the request should not be executed for another 10 minutes.

If the system you are using is very heavily loaded, you also may see the preceding display; the display would then indicate that the request had not yet been placed in any queue.

When the current time is 2:05 P.M., you could enter the following command:

% qsub -a "14:15" testjob
nqs-181 qsub: INFO
  Request <374.coal>: Submitted to queue <express> by <snow(123)>.
% qstat -a
--------------------------
NQS BATCH REQUEST SUMMARY
--------------------------
IDENTIFIER    NAME    USER     QUEUE                 JID  PRTY REQMEM REQTIM ST
------------- ------- -------- --------------------- ---- ---- ------ ------ ---
374.coal      testjob snow     [email protected]          ---   30   1024     30 W
no pipe queue entries
no device queue entries
%

The W in the ST column indicates that the request is waiting until the time specified in the -a option.

Checking the Status of Queues

To display a summary of the status of all NQS queues, use the cqstatl or qstat command without arguments. (However, if you have the NQE_DEST_TYPE environment variable set to nqedb and you use the cqstatl command without options or arguments, the output is a summary of all your requests in the NQE database minus all terminated requests. For additional information about monitoring queues, see Chapter 11, “Monitoring Queues”.) This display is often large, and you may want to limit the display to a particular type of queue by using one of the following cqstatl or qstat command options:

Option

Description

-b

A summary of all batch queues. (This option does not apply if you are sending the request to the NQE database.)

-p

A summary of all pipe queues. (This option does not apply if you are sending the request to the NQE database.)

Pipe Queues

To display summary information about pipe queues, use the cqstatl -p or qstat -p command (the -p option does not apply if you are sending a request to the NQE database). The following display shows that four NQS pipe queues are at the system called coal:

% qstat -p
-----------------------
NQS PIPE QUEUE SUMMARY
-----------------------
QUEUE NAME              LIM TOT ENA STS QUE ROU  WAI HLD ARR DEP  DESTINATIONS
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------
express                    2   1 yes  on   0   1    0   0   0   0 b_600_96
                                                                  b_1800_96
                                                                  b_3600_96
                                                                  b_14400_96
                                                                  b_86400_96
                                                                  b_max_96
big                       1   0 yes  on   0   0    0   0   0   0  b_large
cray02                    1   0 yes  off  0   0    0   0   0   0  [email protected]
cray03                    1   0 yes  off  0   0    0   0   0   0  [email protected]
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------
coal                      5   1           0   1    0   0   0   0
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------

All of the queues are enabled (shown in the ENA column), therefore, you can send requests to any of them. However, only the express and big queues are started (shown in the STS column); therefore, only these queues will route requests to a destination queue. The other two queues can accept requests, but they do not route them until the queues are started (the NQE administrator must do this).

In this example, one request is currently in queue express. This is being routed (shown in the ROU column) to one of the batch queues under the destination column; NQS decides which queue to use.

The destinations of the two queues, cray02 and cray03, are pipe queues at other systems.

The final line of the display shows total figures for the system in which you are logged (the system is called coal in the preceding example).

Exercise 10

To see an entry for your request in the pipe queue summary display, submit the testjob batch request and try issuing the cqstatl -p or qstat -p command on the command line. Usually, a request remains in a pipe queue for only a short period of time. Therefore, if the system you are using is lightly loaded, you may have to include the cqstatl or qstat command on the same command line as the cqsub orqsub command to catch the request before it is transferred to a batch queue.

% qsub testjob; qstat -p
nqs-181 qsub: INFO
  Request <375.coal>: Submitted to queue <express> by <snow(123)>.
-----------------------
NQS PIPE QUEUE SUMMARY
-----------------------
QUEUE NAME              LIM TOT ENA STS QUE ROU  WAI HLD ARR DEP  DESTINATIONS
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------
express                    2   1 yes  on   0   1    0   0   0   0 b_600_96
                                                                  b_1800_96
                                                                  b_3600_96
                                                                  b_14400_96
                                                                  b_86400_96
                                                                  b_max_96
big                       1   0 yes  on   0   0    0   0   0   0  b_large
cray02                    1   0 yes  off  0   0    0   0   0   0  [email protected]
cray03                    1   0 yes  off  0   0    0   0   0   0  [email protected]
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------
coal                      5   1           0   1    0   0   0   0
----------------------- --- --- --- --- --- ---  --- --- --- ---  -------------
%

Your request is displayed as an entry under the ROU column for the queue to which you submitted the request.

If the system you are using is heavily loaded, you may not see an entry in this display for your request because the request had not yet been placed in any queue.

Batch Queues

To display summary information about batch queues, use the cqstatl -b or qstat -b command (the -b option does not apply if you are sending a request to the NQE database). The following display contains information about the total NQS workload on the system:

% qstat -b
------------------------
NQS BATCH QUEUE SUMMARY
------------------------
QUEUE NAME              LIM TOT ENA STS QUE RUN  WAI HLD ARR EXI
----------------------- --- --- --- --- --- ---  --- --- --- ---
b_600_96                  2   2 yes  on   0   1    1   0   0   0
b_1800_96                 1   1 yes  on   1   0    0   0   0   0
b_3600_96                 1   0 yes  on   0   0    0   0   0   0
b_14400_96                1   1 yes  on   1   0    0   0   0   0
b_86400_96                1   1 yes  on   0   0    1   0   0   0
b_max_96                  1   0 yes  on   0   0    0   0   0   0

----------------------- --- --- --- --- --- ---  --- --- --- ---
coal                      5   5           2   1    2   0   0   0
----------------------- --- --- --- --- --- ---  --- --- --- ---
%

The batch queue summary display is useful if you want to check the status of one of your requests to see why it is taking so long to execute.

The final line of the display shows total figures for the system in which you are logged (the system is called coal in the preceding example).

In this display, some of the most important columns are those labeled LIM, QUE, RUN, and WAI; these columns are described as follows:

Column

Description

LIM

The maximum number of requests that can be executed concurrently in this queue. This limit is called the queue run limit.

QUE

The number of requests in the queue that are queued and ready to be executed. They have not yet begun execution because the limit for the maximum possible number of executing requests has been reached.

RUN

The number of requests in the queue that are currently executing.

WAI

The number of requests in the queue that are waiting to be executed at a specific time (as specified by the cqsub -a or qsub -a command).

The preceding example screen shows that for a batch queue called b_600_96 one request is currently executing, and one request is waiting for a specific time to execute (that is, it was submitted using the -a time option and the specified time has not yet occurred).

The batch queue summary display can give you an idea of the current batch work load on the system.

Exercise 11

To see the current status of the queues at the system you are using, use the cqstatl -b or qstat -b command, as in the following example:

qstat -b

To see whether your system is heavily loaded, look at the bottom line of the display and see how much greater the figure under LIM is than that under RUN. Look also to see the figures under the WAI and QUE columns (these indicate how many requests are queued but not yet executing).

Deleting a Batch Request

To delete batch requests that you have submitted, but no longer want to execute, use the cqdel(1) or qdel(1) command.

% qdel 317.coal
nqs-98 qdel: INFO
  Request <317.coal>: Deleted by <snow(123)>.
%

The preceding form of the qdel command deletes requests that have not started executing.

If a request has started executing, the following message is displayed when you issue a cqdel or qdel command without any options:

% qdel 317.coal
nqs-462 qdel: WARNING
  Request <317.coal>: is running on local host.
%

The request remains unaffected by the qdel command.

You can delete an executing request by sending it a signal with the cqdel -k or qdel -k command. You can send several signals to a request; one of the most common is the SIGKILL signal, which aborts a running process. Signals are associated with a number; the number for the SIGKILL signal is 9.

The number of the signal to send to a batch request is specified as an option in the cqdel or qdel command line. You can use the letter k as an alternative for signal number 9.

Standard output, standard error, and job log files are still produced for an executing request that is deleted by a signal. These files record the execution of the request up to the moment that the signal is received.

% qdel -k 380.coal
nqs-98 qdel: INFO
  Request <380.coal>: Deleted by <snow(123)>.
%

You can use cqdel or qdel to send signals to batch requests. You can write the request to trap the signal and then take some appropriate action, rather than to abort. For an example of a request that is written to trap a signal, see “Signaling Your Requests” in Chapter 13.

You can use the qdel -f command to delete both a request and the job output.

Exercise 12

To include the following line at the end of the script, edit the testjob script:

sleep 120

This keeps the request executing for a couple of minutes to give you time to delete it. The file should then contain the following lines:

set -x (standard or Korn shell only)
set echo (C shell only)
date
pwd
ls
sleep 120

Before proceeding with the exercise, you should check that the script has the preceding contents.

Submit the request, and try to delete it by using the cqdel or qdel command, as in the following example:

qdel requestids

If the request has already begun execution, you will receive the following message:

Request requestids is running.

In this case, reenter the cqdel or qdel command by using the -k option to send a SIGKILL signal to the request, as follows:

qdel-k requestids

If the request is still executing when you issue this command, you will receive a message, as follows:

Request requestids has been deleted.

To check that the request has been deleted, use the cqstatl -a or qstat -a command, as in the following example:

% qstat -a
no batch queue entries
no pipe queue entries

Removing Files from Your Directory after the Tutorial

The exercises in this tutorial will create several files in the directory from which you submitted the requests. You may want to remove the unwanted files from that directory. The following files were created:

File name

Description

testjob

One of the scripts used in the exercises

testjob.o*

Standard output files produced by the testjob script

testjob.e*

Standard error files produced by the testjob script

testjob.l*

Job log files produced by the testjob script

Summary

If you have completed this tutorial, you should be able to submit batch requests, to send a message to a request, to monitor the progress of your requests, and, if needed, to delete a batch request. The following list summarizes the commands and features that were introduced in this tutorial:

Command line
 

Description

 

cqdel requestids or qdel requestids
 

Deletes the specified request if it has not begun execution.

cqdel -k requestids or qdel -k requestids
 

Sends a kill signal to the specified batch request. This signal aborts the request if it is executing; otherwise, the request is deleted.

qmsg
 

Writes a message to the output file of an executing request.

cqstatl -a or qstat -a
 

Displays summary information about all of your requests that are currently residing in queues.

cqstatl -b or qstat -b
 

Displays a summary of NQS batch queues.

cqstatl -p or qstat -p
 

Displays a summary of NQS pipe queues.

cqsub or qsub
 

(This form of the cqsub or qsub command has no options or file names in the command line.) Lets you enter commands that are then submitted as a batch request. The batch request is sent to the default queue. The batch request is assigned a name of STDIN, which is used in the names of the standard output and standard error files produced by the request.

cqsub batch-request-file or qsub batch-request-file
 

Submits the name of the script file that will be submitted as a batch request into the default queue.

cqsub -a date -lT time -lM size or qsub -a date -lT time -lM size
 

These cqsub and qsub options specify resource limits for the request:

-a date

Earliest time at which the request can be executed

-lT time

Maximum CPU time for the request

-lM size

Maximum memory available to the request

cqsub -q queue batch-request-file or qsub-q queue batch-request-file
 

Submits the specified file into the specified NQS queue.

#QSUB qsub-option
 

Options to the cqsub or qsub command can be placed within a script file before any other commands.

QSUB_QUEUE environment variable
 

Sets the default queue for your session.

Other options to the commands are described in detail in the corresponding man pages.