This chapter describes compatibilities and differences users may see when upgrading to this NQE release. Described in this chapter are the following:
Compatibilities and differences between NQE 3.2 and NQE 3.3
Additional compatibilities and differences for users when you are upgrading directly to NQE 3.3 from a version of NQE prior to NQE 3.2
For feature descriptions, see Chapter 2, “New Features”.
The following sections describe compatibilities and differences between the NQE 3.2 and NQE 3.3 releases.
If you are upgrading directly to NQE 3.3 from a version of NQE prior to NQE 3.2, you should also read “Compatibilities and Differences Between NQE 3.3 and a Version Prior to NQE 3.2” for additional compatibilities and differences that your users may experience.
The NQE_DEFAULT_COMPLIST configuration variable in the nqeinfo file has replaced the NQE_TYPE configuration variable, which defines the list of NQE components to be started or stopped. The NQE 3.3 release is shipped with the NQE_DEFAULT_COMPLIST variable set to the following components:
NQS, COLLECTOR, NLB |
In NQE 3.2, the memory that was used by all processes in a request (per-request memory) was computed by adding the virtual address space sizes that were allocated for each process. This value did not accurately reflect the actual amount of memory used by a request, especially when the processes in the request accessed shared memory. In NQE 3.3, the per-request memory computation method is based on the amount of memory that is actually referenced by each process. The memory that is shared by different processes in the request is counted only once. This method results in a total memory value that can be much lower than the memory value that was computed in NQE 3.2. To adjust to this new memory computation method, users and administrators can modify their per-request limit values on queues and individual jobs.
The following sections describe additional compatibilities and differences users may see if you are upgrading directly to NQE 3.3 from a version of NQE prior to NQE 3.2.
For a complete list of compatibilities and differences that may affect users, you should also read “Compatibilities and Differences Between NQE 3.2 and NQE 3.3” for compatibilities and differences that were introduced between the NQE 3.2 and NQE 3.3 releases.
Users may notice a slight difference in the default output file names from NQE database jobs. For jobs going through the NQE database, the number portion of the default name is now the NQE database task ID, not the Network Queuing System (NQS) request ID. For jobs going through NQS only, the number portion of the default name remains the request ID. This change ensures unique default output file names within an NQE database cluster for jobs submitted to the NQE database.
The NQEDB_TRACE_LEVEL environment variable was removed. For a description of the related new feature, see “Features Added in the NQE 3.2 Release” in Chapter 2.
NQS now recovers jobs that are terminated because of hardware problems. If a job is terminated through receipt of a SIGRPE or SIGUME signal, NQS requeues the job rather than deleting it if the job is rerunnable or if the job is restartable and has a restart file. The following actions can be expected when a job is terminated by either a SIGRPE or SIGUME signal:
For a job that has default rerun and restart attributes, the job is requeued and rerun.
For a job that has default rerun and restart attributes and has a restart file associated with it, the job is requeued and restarted from the restart file.
For a job that has the no-rerun attribute and has no restart file, the job is deleted.
For a job that has the no-rerun attribute but does have a restart file, the job is requeued and restarted from the restart file.
For a job that has the no-restart attribute and uses the default rerun attribute, the job is requeued and rerun.
For a job that has the no-rerun and no-restart attributes, the job is deleted.
The cload(1) command was removed as of the NQE 3.1 release; the functionality is provided through the NQE GUI Load window.
The cqstat(1) command was removed as of the NQE 3.0 release; the functionality is provided through the NQE GUI Status window.
The NQE GUI is invoked by using the nqe(1) command.
Previous releases of NQS and the Network Queuing EXtensions (NQX) on Cray Research systems were bundled with the UNICOS operating system and installed in the /etc, /usr/bin, /usr/lib, /usr/include, and /usr/man directories. Beginning with the NQE 3.1 release, NQE on UNICOS and UNICOS/mk platforms is released as an asynchronous product; NQE is installed in the /nqebase directory (that is, /opt/craysoft/nqe for UNICOS and UNICOS/mk operating systems). For a description of the NQE directory structure, see NQE Installation, publication SG-5236.
This path change affects both administrators and end users of NQS, FTA, and NQE (NQX). Users must be notified of the new command location so that their user environment can be changed to access the commands from /opt/craysoft/nqe/bin. For example, this command path change will affect user cron jobs, job submission scripts, and any user programs that reference NQS, FTA, or NQE (NQX) commands.
The system files that set up user environments can be modified to add /opt/craysoft/nqe/bin to the default path. The modules package can be used to set up the appropriate path to the NQE commands. For more information about the modules package, see NQE Installation, publication SG-5236.
On Cray PVP systems running only the NQE subset (NQS and FTA components), users have limited NQE functionality. New functionality of the NQE product, such as DCE support and the NQE database and its scheduler, is not accessible on Cray PVP systems running only the NQE subset.
For UNICOS and UNICOS/mk systems using the modules interface, frequent users of NQE should modify their .cshrc or .profile file in order to use NQE. Users who occasionally use NQE can use a command entered at the command line. Instructions are included in NQE Installation, publication SG-5236, for system administrators to pass along to their users.
Users will notice the following differences when they use NQE on CRAY T3E systems.
The qsub(1) options listed here have the following meanings for CRAY T3E systems:
The -lt and -lT options specify the maximum CPU time on command PEs for any process of the request (-lt) and the cumulative CPU time of all processes of a request on command PEs (-lT).
The -l p_mpp_t and -l mpp_t options specify the maximum CPU time on application PEs for any process of the request (p_mpp_t) and the cumulative CPU time of all processes of a request on application PEs (mpp_t).
The -l mpp_p option specifies the maximum number of PEs required for the request (the default is 1 command PE).
On CRAY T3E systems, the qstat(1) status displays show both the CPU time limits and usage for command PEs and application PEs that are associated with a request, as follows:
The qstat -m display shows queue information for CPU time limits of application PEs.
The qstat -m req-id display shows the CPU limits and usage for application PEs associated with the specified request, both per-process and per-request.
The qstat -a display shows the CPU time remaining for the command PEs associated with a request.
The qstat -f req-id display shows CPU information for the command PEs on the CPU Time Limit line and shows CPU information for the application PEs on the MPP Time Limit line.
NQS continues to set resource limits before initiating jobs. On CRAY T3E systems and Cray PVP systems, it is the operating system that enforces the resource limits and accumulates resource usage. Because UNICOS/mk resource limit enforcement and resource usage accumulation is not available for all resource types, the qstat displays show zero values for some resources.
The following documentation changes have occurred.
All NQE 3.3 publications are available from the Cray Research Online Software Publications Library, which is available publicly at the following URL:
http://www.cray.com/swpubs/ |
The NQE online publications are provided through the Cray DynaWeb server. The Cray DynaWeb server allows users to access information using a World Wide Web browser, such as Netscape.
All NQE 3.3 publications are also provided on the Cray DynaWeb CD-ROM that is included with your NQE 3.3 release package.
PostScript files of NQE 3.3 publications are also provided through the Cray Research Online Software Publications Library, on the Cray DynaWeb CD-ROM, and through anonymous FTP.
Man pages are still accessed by using the man command.
CrayDoc is no longer installed as part of the NQE release.
For more information about NQE documentation, see “NQE Documentation” in Chapter 4.
Beginning with the NQE 3.1 release, the NQE User's Guide, publication SG-2148, is available online through the Cray DynaWeb server and is also available as a PostScript file on the Cray DynaWeb CD-ROM that is included with your NQE release package. A PostScript file of this guide is also available through anonymous ftp. This guide is no longer available through the NQE GUI help facility. For more information about NQE documentation, see “NQE Documentation” in Chapter 4.
Beginning with the NQE 3.1 release, NQE is released asynchronously from the UNICOS operating system, and NQS and FTA are packaged in the NQE product for all platforms. As a result, documentation that was previously provided in the Network Queuing System (NQS) User's Guide, publication SG-2105, the UNICOS NQS and NQE Administrator's Guide, publication SG-2305, and the FTA User and Administrator Manual, publication SG-2144, has been incorporated into the NQE documentation set. For more information about NQE documentation, see “NQE Documentation” in Chapter 4.