This revised NQE Administration, publication SG-2150, supports the 3.3 release of the Network Queuing Environment (NQE) release.
NQE system administration documentation was revised to support the following NQE 3.3 features:
On CRAY T3E systems, NQE now supports checkpointing and restarting of jobs. This feature was initially supported in the NQE 3.2.1 release. This feature requires the UNICOS/mk 2.0 release or later.
On CRAY T3E systems, NQE now supports the Political Scheduling feature. This feature was initially supported in the NQE 3.2.1 release. This feature requires the UNICOS/mk 2.0 release or later.
On CRAY T3E systems, NQE now supports mixed mode scheduling.
Applications running on a CRAY T3E system are killed when a PE assigned to the application goes down. NQS is now notified when a job is terminated by a SIGPEFAILURE signal (UNICOS/mk systems only). NQE will requeue the job and either restart or rerun the job, as applicable.
For CRAY T3E systems, this release adds CPU and memory scheduling weighting factors for application PEs. The NQS scheduling weighting factors are used with the NQS priority formula to calculate the intra-queue job initiation priority for NQS runnable jobs. This release also restores user-specified priority scheduling functionality.
The multilevel security (MLS) feature on UNICOS/mk systems is supported with this NQE release.
Distributed Computing Environment (DCE) support was enhanced as follows:
Ticket forwarding and inheritance is now supported. This feature allows users to submit jobs in a DCE environment without providing passwords.
IRIX systems now support access to DCE resources for jobs submitted to NQE.
UNICOS/mk systems now support access to DCE resources for jobs submitted to NQE.
DCE is supported on all NQE platforms except SunOs. Ticket forwarding is not supported for the Digital UNIX operating system in the NQE 3.3 release.
For IRIX systems running NQE, this release introduces a new scheduler called the Miser scheduler. The Miser scheduler is a predictive scheduler that evaluates the number of CPUs and the amount of memory a batch will require. NQE now supports the submission of jobs that specify Miser resources.
Array services support was added for UNICOS systems.
The new nqeinfo(5) man page documents all NQE configuration variables; the nqeinfo(5) man page is provided in online form only and is accessible by using the man(1) command or through the NQE configuration utility Help facility.
This release replaces the NQE_TYPE variable in the nqeinfo(5) file with a new NQE_DEFAULT_COMPLIST variable, which defines the list of NQE components to be started or stopped. The NQE 3.3 release is shipped with the NQE_DEFAULT_COMPLIST variable set to the following components: NQS, COLLECTOR, and NLB.
The following NQE database enhancements were made:
Increased number of simultaneous connections for clients and execution servers to the NQE database.
The MAX_SCRIPT_SIZE variable was added to the nqeinfo file, allowing an administrator to limit the size of the script file submitted to the NQE database. If the MAX_SCRIPT_SIZE variable is set to 0 or is not set, a script file of unlimited size is allowed. The script file is stored in the NQE database; if the file is bigger than MAX_SCRIPT_SIZE, it can affect the performance of NQE database and the nqedbmgr. The nqeinfo(5) man page includes the description of this new variable.
The csuspend utility has the following two new command-line options: -l loopcount and -p period. These two new options suspend or enable batch processing based on interactive use. The amount of interactive use is determined by calls to sar. These options give the administrator greater control over how sar is used and, consequently, the frequency of checking on whether to suspend or start NQE. The csuspend(8) man page were revised to reflect this new capability.
The qstart(8) and qstop(8) commands now allow an administrator to execute programs immediately before and after the NQS daemon starts (NQE_ETC/qstart.pre and NQE_ETC/qstart.pst, where NQE_ETC is defined in the nqeinfo file) and immediately before and after the the NQS daemon is shut down (NQE_ETC/qstop.pre and NQE_ETC/qstop.pst, where NQE_ETC is defined in the nqeinfo file). The administrator must create the file and it must be executable. The nqeinit(8), nqestop(8), qstart(8), and qstop(8) man pages were revised to reflect this new capability.
NQS sets several environment variables that are passed to a login shell when NQS initiates a job. One of the environment variables set is LOGNAME, which is the name of the user under whose account the job will run. Some platforms, such as IRIX, use the USER environment variable rather than LOGNAME. On those platforms, csh writes an error message into the job's stderr file, noting that the USER variable is not defined. To accommodate this difference, NQS now sets both the LOGNAME and USER environment variables to the same value before initiating a job. The ilb(1) man page were revised to include this new variable.
Year 2000 support for NQE has been completed.
The chapters documenting “Preparing a Node to Run NQE” and “NQE Version Maintenance” removed from this administration guide; they are included in NQE Installation, publication SG-5236.
The project ID is added to the end of the current accounting records in the NQS accounting file (nqsacct) written by the NQS daemon accounting.
For a complete list of new features for the NQE 3.3 release, see the NQE Release Overview, publication RO-5237.