Chapter 10. Monitoring Requests

This chapter describes how to use the NQE GUI Status window and the cqstatl command or qstat command to display information about requests. The following topics are covered:


Note: If you do not have an NQE license, you cannot access the NQE GUI and the cqstatl command. You can access only the qstat command from an NQS server.

If the UNICOS multilevel security (MLS) feature or the UNICOS/mk security enhancements are enabled on your system and NQS is configured to enforce mandatory access control (MAC), your active label must dominate the job submission label in order for you to receive status information. To display the job submission and execution label information for a specific job, use the qstat -f command. NQS managers and operators bypass the MAC checks.

Using the NQE GUI Status Window

The NQE GUI Status window provides a refreshed summary of request status. By default, you can see all of the requests in the group of execution nodes in the NQE cluster; your NQE administrator cannot disable this display. However, your NQE administrator can enable or disable the display that provides the full details of the requests that you submit.

Using the NQE GUI Status window lets you do the following:

  • Monitor status of all your requests. You do not have to know the location of your request before you request status on it. Request status is updated (refreshed) at configurable intervals.

  • Tailor the display. You can specify how you want your display to look and what information is displayed.

To open NQE GUI Status window, access the NQE GUI by keying in the nqe command and, using the left mouse button, click once on the Status button of the initial NQE GUI button bar.


Note: The mouse button settings described in this guide are the default settings.

Figure 10-1 shows the Status window.

Figure 10-1. NQE GUI Status Window

The following data about requests is displayed by default:

Column name 

Description

Location 

The request's location, which can be either a queue or the NQE database.

Job Identifier 

The job identifier; possible identifiers are as follows:

  • NQS request ID (for example, 5703.fog).

  • NQE database ID, known as the task ID or tid (for example, t1).

  • NQE database ID with the NQS request ID; the copy of the request that is executed receives an NQS request ID (for example, t4(61178.rain)).

Job Name 

Name of the request

Run User 

User name with which the request was submitted.

Job Status 

Status of the request. For details of the abbreviations used, see “Status Codes”.

SubStatus 

Substatus of the request. For details of the abbreviations used, see “Substatus Codes”. Some states do not have an associated substatus.

CPU Used 

CPU usage (in seconds) for the request. On some platforms, a display of the amount of CPU that the request consumes is not available, and a 0 appears in this column.

Memory Used 

Memory usage (in words) for the request. On some platforms, a display of the amount of memory that the request consumes is not available, and a 0 appears in this column.

FTA Used 

FTA usage for the request; usage setting can be Yes or No.

You can get the following online help by using the NQE GUI Status window:

  • The context-sensitive help area is located in the lower left area at the bottom of the Status window. This area shows one-line informational messages about the area on the Status window that is directly under your mouse pointer.

  • Help menu button. The Help menu button is located in the upper right of the window. It lets you open a window that displays help topics that you can select and view. Use the left mouse button to select a topic.

From the main Status window, you can select specific requests and receive a detailed display. To view a detailed display, do one of the following:

  • Double-click on a request in the main window.

  • Click once on the request in the main window, pull down the Actions menu, and select Detailed Status.

This action can take a long time to complete, depending on network traffic.


Note: If you cannot perform this operation, you cannot perform the same operation by using the cqstatl -f command or the qstat -f command. Your privileges may not be set correctly.

The Filter menu lets you control the number of requests displayed. Table 10-1 explains how you can use these filters. Figure 10-2 shows a sample Originating Host submenu.


Note: To select an item, click on the box next to it, and then click the Apply button. The Set All button selects all available options in the display. The Unset All button eliminates all selected items. If you click on the Unset All button, you can select other items you want to appear.


Table 10-1. NQE GUI Status Window Filter Options

Filter

Description

Destination Host

Displays only specified destination hosts, including the NQE database

Run User

Displays only requests currently owned by the specified user name

Originating User

Displays only requests originally submitted by the specified user name

Originating Host

Displays only requests submitted from the specified host, including the NQE database

Location

Displays only requests that are at the specified location

Job Identifier

Displays only requests that have the specified NQS request identifier or NQE database task identifier

Clear Filters

Resets all filters to a cleared state

List Filters

Lists a summary of all filters in use

Save Filters

Saves all filters in use


Figure 10-2. Sample Originating Host Filter Submenu


Using the cqstatl and qstat Commands

The cqstatl command and the qstat command provide request status information in an ASCII-based, static display.

For a summary of the cqstatl and qstat command options, see the cqstatl(1) and qstat(1) man pages.

This section covers the following topics:

Displaying Summaries

You can display a summary of requests that are in batch queues, pipe queues, and the NQE database (requests in pipe queues are not applicable for requests sent to the NQE database).

Summary of Particular Requests

To display summary information for particular requests sent to the NQE database, use the following command (the qstat command cannot be used for requests sent to the NQE database):

cqstatl -d nqedb tids

The tids argument is the task identifier displayed when you submitted the request to the NQE database. You can specify more than one task identifier. Separate request identifiers with a space. (The tid is also displayed on the NQE GUI Status window.)

To display summary information for particular NQS requests, you can use the following commands:

cqstatl -d nqs requestids

qstat requestids

The requestids argument is one of the following:

  • If you submitted a request to NQS, requestid is the request identifier displayed when you submitted the request to NQS.

  • If you submitted a request to the NQE database, requestid is the request identifier of the copy of the request executing in NQS. The requestid is displayed on the NQE GUI Status window in parentheses after the tid (for example, t4(61178.rain)).

You can specify more than one request. Separate request identifiers with a space.

Summary of All Your Requests

To display summary details of all your requests in the NQE database, use the following command (the qstat command cannot be used for requests sent to the NQE database):

cqstatl -d nqedb -a


Note: If you have the NQE_DEST_TYPE environment variable set to be nqedb, omit the -d nqedb option.

The following display shows a summary of all requests that belong to the user who issued the cqstatl -d nqedb -a command.

carob$
carob$ cqstatl -d nqedb -a
--------------------------------
NQE Database Request Summary
--------------------------------
IDENTIFIER  NAME   SYSTEM-OWNER    USER     LOCATION/QUEUE ST
---------- ------- --------------- -------- -------------- ---
t1         STDIN   monitor.main    shelley  NQE Database   NComp
t3         STDIN   monitor.main    shelley  NQE Database   NTerm
t4         STDIN   monitor.main    shelley  NQE Database   NTerm
t5         STDIN   scheduler.main  shelley  NQE Database   NPend


Note: By default , if you use the cqstatl command without options or arguments, the output is a summary of each NQS queue on the NQS server. However, if you have the NQE_DEST_TYPE environment variable set to be nqedb, and you use the cqstatl command without options or arguments, the output is a summary of all your requests in the NQE database minus all terminated requests. (For additional information about monitoring queues, see Chapter 11, “Monitoring Queues”.)

The columns in the request summary displays have the following meanings:

Column name 

Description

IDENTIFIER 

The task identifier, as displayed when you first submitted the request. The tid is also displayed on the NQE GUI Status window.

NAME 

The name of the request. If you omitted this option when submitting the request, NAME is the name of the script file, or STDIN if the request was created from standard input.

SYSTEM-OWNER 

The NQE database component currently owning the request.

USER 

The name under which the request will be executed at the NQS system (either the name of the user who submitted the request or the name of the user specified when the request was submitted).

LOCATION/QUEUE 

The request resides in the NQE database.

ST 

An indication of the current state of the request. This can be composed of a state value and a substate value (similar to major and minor status values for requests sent to NQS). For a description of state codes, see “Status Codes”. For a description of substate codes, see “Substatus Codes”.

You cannot use the cqstatl or qstat command to display details about the requests of other users unless you are an NQE administrator. For more information, see “Specifying Another User Name”.

To display summary details of all your requests on your NQS server (as defined by NQS_SERVER), use the following command, for example:

cqstatl -a


Note: If you have the NQE_DEST_TYPE environment variable set to be nqedb, the preceding command displays the output shown in “Summary of All Your Requests”.

The following is a summary of all NQS requests that belong to the user who issued the cqstatl -a command:

% cqstatl -a
-------------------------------
NQS BATCH REQUEST SUMMARY
-------------------------------
IDENTIFIER   NAME    USER     LOCATION/QUEUE        JID  PRTY REQMEM REQTIM ST
------------ ------- -------- -------------------- ---- ---- ------ ------ ---
1108.coal    testjob us1      [email protected]         3494  --- 262144    600 R
------------------------------
NQS PIPE REQUEST SUMMARY
------------------------------
IDENTIFIER    NAME    OWNER    USER     LOCATION/QUEUE         PRTY ST
------------- ------- -------- -------- --------------------- ---- ---
1049.coal     test2   1201     us1      [email protected]                1 R  

The columns in the request summary displays have the following meanings:

Column name 

Description

IDENTIFIER 

The request identifier (as displayed when you first submitted the request). The request identifier is also displayed on the NQE GUI Status window.

NAME 

The name of the request. If you omitted this option when submitting the request, NAME is the name of the script file, or STDIN if the request was created from standard input.

OWNER 

(Pipe queue displays only) The user ID under which you were logged in when you submitted the request.

USER 

The user name under which the request will be executed at the NQS system.

LOCATION/QUEUE 

The NQS queue in which the request currently resides.

JID 

(Batch queue displays only) NQS job identifier.

PRTY 

For a request awaiting execution, its intraqueue priority; for an executing request, its nice value.

 

The priority value is not available until after the NQS scheduler examines the queue (the priority field displays only dashes (---) while the queue is examined). After the scheduler examines the queue, the requests are sorted in order of priority. For an executing request, the priority is its nice value; for a queued request, the priority is its intraqueue priority, which is a value from 1 through 999. For an executing request that is scheduled by using the qmgr schedule request first or qmgr schedule request next command, the priority is displayed as FRST or NEXT, respectively. For an executing request that is scheduled by using the qmgr schedule request now command, the priority is displayed as NOW.

REQMEM 

(Batch queue displays only) The maximum amount of memory (in Kilowords) that the request is allowed to use if the request has not started to execute. If the request is executing, REQMEM shows the current amount of memory allocated to the request.

REQTIM 

(Batch queue displays only) The number of seconds of CPU time remaining for the request. You can monitor this column to determine how your request is progressing.

ST 

An indication of the current state of the request. This can be composed of a major and a minor status value. For a description of major status codes, see “Status Codes”. For a description of minor status codes, see “Substatus Codes”.

You cannot use the cqstatl command or the qstat command to display details about the requests and NQS activity of other users unless you are an NQE administrator or unless you are authorized to execute NQS requests under another user name. For more information, see “Specifying Another User Name”.

Displaying Details

To display the full details of all your requests, use one of the following commands.

  • If you submitted a request to NQS, you can use one of the following commands:

    cqstatl -d nqs -f requestids

    qstat -f requestids

    The requestids argument is the request identifier displayed when you submitted the request to NQS. You can specify more than one request. Separate request identifiers with a space.

  • If you submitted a request to the NQE database, use the following command:

    cqstatl -d nqedb -f tids


    Note: If you have the NQE_DEST_TYPE environment variable set to be nqedb, omit the -d nqedb option.

    For requests sent to the NQE database, the tids argument is the task identifier of the request in the NQE database. You can specify more than one task. Separate task identifiers with a space. When the request is in NQS, it receives a requestid, which is displayed on the NQE GUI Status window in parentheses after the tid (for example, t4(61178.rain)).

The following sample display shows the output if you specified request 155 in an NQS batch queue. Some of the resource limits shown in the display are enclosed in the < and > symbols, which indicate that you did not explicitly specify the limit. Instead, NQS has taken the limit from either the resource limits associated with the queue or the user database (UDB) limits associated with the user under whom the request is executing (whichever limit is most restrictive).

Per-process and per-request limits are associated with each request. These are shown in the PROCESS LIMIT and REQUEST LIMIT columns in the display. For a discussion of per-process and per-request limits, see “Types of Limits” in Chapter 5.


Note: If a request that was sent to the NQE database is executing, cqstatl obtains status from NQS. If the request that was sent to the NQE database is not executing, status information is obtained from the NQE database. The detailed display of an NQE database request will be similar to the following sample display; it also will include the request's NQE task identifier (tid).

For more information on the display fields, see the cqstatl(1) or the qstat(1) man page.

% cqstatl -d nqs -f 155 | more
----------------------------------
NQS BATCH REQUEST: job.latte            Status:          RUNNING
----------------------------------                             Processes
                                                               Active
        NQE Task ID: - -
        NQS Identifier: 155.latte              Target User:    jane
                                               Group:          pubs
        Account/Project: <[1201]>
        Priority:       ---
        User Priority/URM Priority Increment: 1
        Job Identifier: 16802
        Local Scheduler: Requested = OS default, Using = OS default

        Created:        Wed Mar 18 1998        Queued:         Wed Mar 18 1998
                        12:34:04 CST                           12:34:08 CST
<LOCATION/QUEUE>
        Name:           [email protected]          Priority:       30
<RESOURCES>
                                PROCESS LIMIT   REQUEST LIMIT
        CPU Time Limit           <unlimited>     <unlimited>
        Memory Size                 <256mw>         <256mw>
        Permanent File Space        <100mb>           <0>
        Quick File Space              <0>             <0>
        Type a Tape Drives                            <0>
        Type b Tape Drives                            <0>
        Type c Tape Drives                            <0>
        Type d Tape Drives                            <0>
        Type e Tape Drives                            <0>
        Type f Tape Drives                            <0>
        Type g Tape Drives                            <0>
        Type h Tape Drives                            <0>
        Nice Increment                <0>
        Temporary File Space          <0>             <0>
        Core File Size              <256mw>
        Data Size                   <256mw>
        Stack Size                  <256mw>
        Working Set Limit           <256mw>
        MPP Processor Elements                        <0>
        MPP Time Limit               <10sec>         <10sec>
        Shared Memory Limit                           <0>
        Shared Memory Segments                        <0>
        MPP Memory Size             <256mw>         <256mw>
<FILES>                 MODE            NAME
        Stdout:         spool           [email protected]:/home/ice34/jane/job.o155
        Stderr:         spool           [email protected]:/home/ice34/jane/job.e155
        Job log:        spool           [email protected]:/home/ice34/jane/job.l155
        Restart:                        <UNAVAILABLE>
<MAIL>
        Address:                [email protected]     When:
<PERIODIC CHECKPOINT>
        System:         off                     Request:        System Default
        Cpu time:       on     60 Min           Cpu time:       def <Default>
        Wall clock:     off   180 Min           Wall clock:     def <Default>
        Last checkpoint:None
<SECURITY>
        Submission level:               N/A
        Submission compartments:        N/A
        Execution level:                N/A
        Execution compartments:         N/A
<MISC>
        Rerunnable      yes                     User Mask:      027
        Restartable     yes                     Exported Vars: basic
        Shell:          DEFAULT
        Orig. Owner:    [email protected]

Displaying Requests on Other Servers


Note: Requests submitted to the NQE database do not require the cqstatl command to view requests on other servers. The NQE GUI Status window displays all requests submitted to the NQE database that are routed to any location in the group of execution nodes in the NQE cluster.

If your requests are routed to queues at a remote NQS server, you can specify the name of the remote system in one of the following ways to display details about the requests:

  • Use the cqstatl -h command or the qstat -h command and specify the network host name of the NQS server. For example, the following command displays a summary status of all your requests at an NQS host called sun1:

    cqstatl -a -h sun1

  • Include the host name when you specify a specific request identifier to cqstatl or qstat, as follows:

    request_identifier@target_host

The cqstatl command in the following example displays summary information about a request called 1060.coal at an NQS server called green1:

% cqstatl [email protected]
-------------------------------
NQS BATCH REQUEST SUMMARY
-------------------------------
IDENTIFIER    NAME     USER     QUEUE              JID  PRTY REQMEM REQTIM ST
------------- -------  -------- ------------------ ---- ---- ------ ------ ---
1060.coal     testjob  us1      [email protected]       3494  --- 262144    600 R  

If password validation is in force, you must include the cqstatl -P option or set the NQS_PASSWORD_NEEDED environment variable to ensure that you are prompted for a password. The password requested is for the user name at the remote NQS server on which the request executes.

If both password validation and validation files are in force at the remote system, omit the -P option on the cqstatl command line. The validation file is then checked.

If validation files are checked, the cqstatl command is successful only if your user name and a host are included in a validation file at the remote system. For more information about passwords and validation files, see Chapter 2, “Preparing to Use NQE”.

If the UNICOS MLS feature or the UNICOS/mk security enhancements are enabled on a remote host, you cannot display information from that remote host if the host has a workstation access list (WAL) entry for the host of origin that restricts your access to NQS services.

Specifying Another User Name

To display information about requests submitted under another user name, use one of the following commands:

cqstatl -u username

qstat -u username


Note: For NQE database requests, you must use the following command: cqstatl -d nqedb -u dbuser=dbusername.

If password validation is in force, you must include the cqstatl -P option or set the NQS_PASSWORD_NEEDED environment variable to ensure that you are prompted for a password.

If both password and file validation are in force, you do not have to set the environment variable or specify the cqstatl -P option. The validation file for username is checked as described in “Validation File Examples” in Chapter 2.

The following example displays summary information about the requests submitted by user name sandy that are executing at remote server sun1:

cqstatl -a -h sun1 -u sandy

If the UNICOS MLS feature or the UNICOS/mk security enhancements are enabled on your system and you submit a remote request, the system might be configured to require the /etc/hosts.equiv and .rhosts files to each contain a match for the remote host and require that the remote user and local user names match (that is, the -u option is not allowed).

Displaying Cray MPP Information

To display information related to the Cray MPP systems, use the cqstatl -m command or the qstat -m command. For more information about this command option, see the cqstatl(1) or the qstat(1) man page.

Request Status

The request status is expressed in two parts: the major status and the minor status if the request was sent to NQS, or the state and substate if the request was sent to the NQE database. The status codes are described in the following sections.

Status Codes

The major status or state of a request can be one of the following codes:

Status/State

Description

A

ARRIVING. The request is arriving in a queue.

C

CHECKPOINTED. (UNICOS, UNICOS/mk, and IRIX systems only.) The request in a batch queue was checkpointed and is no longer running.

D

DEPARTING. The request left a pipe queue before its arrival at a destination queue.

E

EXITING. The request in a batch queue completed execution and is currently leaving the system.

H

HELD. The request was prevented from entering another state by operator action. If the request had already been running, a restart file was created.

N

NQE Database. The request is in the NQE database.

P

PREEMPTED. The request was preempted. When a request is preempted, a restart file is created.

Q

QUEUED. The request is in a queue and is eligible for routing or running.

R

ROUTING. The request is being routed to another queue (no minor status is associated with this status).

R

RUNNING. The request is in a batch queue and is currently being processed.

S

SUSPENDED. The request is executing in a batch queue, but its execution was suspended.

U

UNKNOWN. The state of the request cannot be determined.

W

WAITING. The request is prevented from proceeding by a date and/or time constraint imposed at the time of submission (by the cqsub -a or qsub -a command), by the inaccessibility of a pipe queue destination, or because a license cannot be obtained.

no entry

<CHANGING STATE>. The status of the request is changing. This request status can also be displayed if the request was moved into the running subqueue but the associated session has not yet been created, if the NQS daemon aborted or hung while the request was running, or if the shepherd process is taking a long time to process the request exit.

Substatus Codes

The minor status or substate of a request can be one of the following codes:

Status/Substate

Description

number

The number of currently active processes started by the request.

ce

(CrayMPP systems only) The complex CrayMPP processing element (PE) limit was reached.

cg

The complex group run limit was reached.

cm

The complex memory limit was reached.

Comp

The request that was submitted to the NQE database has completed processing.

cq

The complex quickfile (SDS) limit was reached.

cr

The complex run limit was reached.

cu

The complex user run limit was reached.

du

The pipe queue destination is currently unavailable.

ge

(Cray MPP systems only) The global Cray MPP PE limit was reached.

gg

The global group run limit was reached.

gm

The global memory limit was reached.

gq

The global quickfile (SDS) limit was reached.

gr

The global run limit was reached.

gt

The global tape drive limit was reached.

gu

The global user run limit was reached.

lm

A license could not be obtained for the request.

md

(Cray MPP systems only) The CRAY T3D system is not accessible.

mp

(Cray MPP systems only) The CRAY T3D system is accessible but insufficient PEs are available.

New

The request is in the NQE database.

nu

The NLB server is not available.

op

The current major status of the request occurred through operator action.

Pend

The request in the NQE database is awaiting scheduling (pending).

qe

(Cray MPP systems only) The queue Cray MPP PE limit was reached.

qg

The queue group run limit was reached.

qm

The queue memory limit was reached.

qq

The queue quickfile (SDS) limit was reached.

qr

The queue run limit was reached.

qs

The queue in which the request resides was stopped.

qu

The queue user run limit was reached.

rj

(UNICOS systems only) The Unified Resource Manager (URM) rejected the request.

Sche

The request is in the NQE database and has been scheduled by the NQE scheduler.

sh

The system was shut down.

Subm

A copy of the request has been submitted for processing from the NQE database.

td

(UNICOS systems only) The UNICOS tape daemon is unavailable and the request asks for tape resources.

Term

The copy of the request that was submitted for processing from the NQE database has terminated.

us

(UNICOS systems only) The request is in the URM scheduling pool.

??

The current status of the request is unknown.