This chapter provides basic information about the NQE database and its components. It includes the following information:
NQE database model
NLB scheduling versus NQEDB scheduling
An overview of the NQE database
NQE database components
NQE database concepts and terms
Starting the NQE database
Stopping the NQE database
Obtaining status
Security
Configuring the NQE database components
Compacting the NQE database
Chapter 10, “Writing an NQE Scheduler”, describes the NQE scheduler in detail and includes a tutorial for writing a simple NQE scheduler.
While the NQE database model works well for specific environments, it has some limitations. The database used has a limit of 40 simultaneous connections. A connection is used for each execution server and each active client command. The performance of the database makes it well suited for environments where there are a small number of long running job scripts. It does not perform as well when there are large numbers of job scripts or very large single scripts.
Also, using the NQE database model requires that an installation write their own scheduler. The scheduler must be written using the TCL script language.
This section provides a brief overview of the advantages and disadvantages of both NLB scheduling and NQE database scheduling.
Failure recovery for the NLB is possible by specifying a list of nodes for the NLB_SERVER variable in the nqeinfo(5) file. The collectors will update the NLB database on each specified NLB server so that all nodes have up-to-date information. The NLB clients query the NLB servers one by one in the configured order until they find one that is running or until they exhaust the list. The one potential problem with NLB failure recovery involves using the cevent(1) command for Job Dependency. For more information, see “Multiple NLB Servers and Job Dependency” in Chapter 12.
There is no failure recovery for the NQEDB node. If the NQEDB node goes down or loses network access, the cluster becomes a set of individual NQS servers. If jobs finish while the NQEDB node is inaccessible, the NQE database will not be updated and the jobs will still be marked as running in the NQE database.
The NQE database is limited to 36 connections. Four permanent connections are used by the NQEDB sever and one permanent connection is used by each lightweight server. Client connections are all transient and therefore limited only by performance considerations. However, if all 36 connections are used by servers, there is no connection left for clients connections. Therefore, the maximum number of nodes that may be in an NQE cluster is somewhere between 30 and 35, depending on the amount of expected client traffic.
Cluster rerun of jobs is not available with NLB scheduling.
Cluster rerun of jobs is available with the NQE database. However, if there are network problems between the NQEDB sever and an NQS server, the NQS server will appear to be down. In this situation, the NQEDB sever will rerun the job on another server; this results in the job running twice.
Performance is better with the NLB. The NQE database and scheduler are written in TCL, an interpretative language, and is slower. Also, you can run the NLB without the NQEDB sever, but you can not run the NQEDB sever without the NLB, if you are interested in total cycles.
NQS queues give users a rough idea of the relative position of their jobs on that node.
In the NQE database implementation, there is no way for users to understand their job request priority relative to other job requests in queued state.
The NLB allows sites to define policies for destination selection (routing) but not for priority scheduling of job requests. Priority scheduling is strictly first in, first out (FIFO).
The NQE Database as shipped uses FIFO scheduling but also allows sites to define their own schedulers using TCL scripts. This flexibility adds a level of complexity, as well as the obvious level of effort. However, the following types of scheduling would be possible with a customized NQE Scheduler:
Destination selection where the job may not be sent immediately. An example of this would be sending a job to a server only if that server has an average idle CPU of greater than 50%, with the job remaining queued if no server is available.
Network-wide limits. An example of this would be maintaining a global or network-wide limit on the number of jobs that a given user may run simultaneously.
Normalizing resources across machines. For example, CPU time-limit requirements in a heterogeneous network are generally found empirically. It would be useful for the submitting user to supply the CPU requirements in an arbitrary unit that is then converted to seconds for the particular server on which the job is run.
Network-wide rerun of failed jobs. For example, if a server on the network is detected as no longer running, certain jobs that were running on that server could be rerouted and run on another server.
Scheduling on arbitrary data. Certain site-specific information may be important in scheduling. For example, a site may possess only a limited number of network-wide or server licenses for a particular application and the scheduler should not attempt to start up too many jobs using this application.
The NQE database provides the following:
A mechanism for the client user to submit requests to a central storage database.
A scheduler that analyzes the requests in the NQE database and chooses when and where a request will run.
The ability for a request to be submitted to the selected NQS server once the NQE scheduler has selected it.
Figure 9-1 shows an example of the layout of the NQE database. Jobs are submitted using client commands from client machines. The jobs are placed in the NQE database on one machine. A scheduler program determines which machine should run the job. In the example shown in Figure 9-1, the job is sent to Node D, where it is submitted and run in NQS.
Figure 9-2 shows in greater detail the processes and components of the NQE database. The components are described in the following sections.
The NQE database server (mSQL) serves connections from clients in the cluster or locally to access data in the database. The database used is mSQL, and it has the following features:
Simple, single-threaded network database server
Relational database structure with queries issued in a simple form of SQL
The monitor system task is responsible for monitoring the state of the database and which NQE database components are connected. Specifically, it determines which system tasks are connected and alive and, optionally, it purges old requests (called tasks when they are within the database) from the database.
A system task is an application that remains constantly connected to the database and performs a special task.
The scheduler system task analyzes data in the database, making scheduling decisions. Specifically, it does the following:
Keeps track of lightweight servers (LWSs) currently available for use in destination selection
Receives new requests, verifies them for validity, and accepts or rejects them
Executes a regular scheduling pass to find NQS servers to run pending requests
Administrator-defined functions may be written in Tcl to perform site-specific scheduling (see Chapter 10, “Writing an NQE Scheduler”).
The lightweight server (LWS) system task is a bridge to NQS, which performs the following tasks:
Obtains request information from the database for requests assigned to it by the scheduler
Checks authorization files locally
Changes context and submits requests to NQS
Obtains the exit status of completed requests from NQS and updates the NQE database with this information
The following client commands give access to the NQE database; for additional information, see the man page for the specific command or see the NQE User's Guide, publication SG-2148:
cqsub -d nqedb ...
This command submits the task to the NQE database rather than to NQS directly. The NQE database, scheduler, and LWS then submit an NQS request.
![]() | Note:: If you use the nqe command to invoke the NQE GUI, the NQE GUI Submit window has a Submit to NQE selection on the General Options menu that provides the same function as this command. |
cqstatl -d nqedb ...
This command gets the status of tasks in the NQE database, which allows you to see the status of all tasks in the NQE database, not just the local NQS system.
cqdel -d nqedb ...
This command sends a signal event to the task in the NQE database. This event is translated to signal or delete the corresponding NQS request; this action does not delete the task from the NQE database.
![]() | Note:: If you use the nqe command to invoke the NQE GUI, the NQE GUI Status window has a Signal Job selection and a Delete selection on the Actions menu that provide the same functions as this command. |
If you use the nqe command to invoke the NQE GUI, the NQE GUI Status window provides status information for all requests (from the NQE database as well as for requests submitted directly to NQS).
nqedbmgr
This is the NQE database administration command. The command allows administrators to connect to the NQE database, and to examine and modify data in the database. In addition, it provides commands to start and stop aspects of the NQE database. These functions are described in the following sections.
compactdb
This is an NQE database administration command. The command allows an administrator to compact the NQE database to improve its performance. For more information, see “Compacting the NQE Database”.
This section describes the following NQE database terms and concepts:
Objects
Task owners
Task states
Events
The NQE database is comprised of objects. Each object is, in turn, made up of attributes, which are simply name and value pairs. An instance of an object in the NQE database can contain any number of attributes.
Each instance of an object always contains at least the following attributes:
Attribute name | Description |
type | One-character attribute used to differentiate between object types. Object types are described below. |
id | A unique identifier for the object. |
timein | The UNIX time (seconds since 00:00:00 GMT, January 1, 1970) at which the object was first inserted into the NQE database. |
timeup | The UNIX time at which the object was last updated in the NQE database. |
sec.ins.dbuser | The NQE database user name of the user who inserted the object. |
The following object types are defined:
Object type (and abbreviation) | Description |
Task or User task (t) | The object associated with an NQS request. Whenever a request is inserted into the NQE database, a new object of type t is created. Examples of attributes in a task object are: script, exit.status, and env.HOME. For a list of user task object attributes, see Table 10-5. |
Event (e) | The object describing an event. Event objects are inserted into the NQE database in order to cause an action to occur. For a list of standard events, see Table 9-1. Posting an event to the NQE database simply means inserting an event object into the NQE database. |
System task (s) | The object describing a system task. One object of this type is created for each system task connected to the NQE database. Examples of information stored include s.host (host in which the system task is running and s.pid (process ID) of the system task). |
Global object (g) | The object containing global configuration data. There is only one instance of this object type; it contains any configuration information required by any client connecting to the NQE database. Configuration information stored here includes the rate at which system tasks query the NQE database for events, tasks, and so on. For a list of configuration attributes, see Table 9-5. |
The task owner is the system task that is currently responsible for a given user task. Only the task owner may modify the attributes of a task. The task owner is defined in the attributes t.sysclass and t.sysname.
The name of a system task is expressed in the following form:
class.name |
For example, the default scheduler's name is scheduler.main.
class groups together system tasks that perform similar functions. There are three classes, which are defined as follows:
The monitor system task has a class name of monitor.
The scheduler system task has a class name of scheduler.
All LWS system tasks have a class name of lws.
name describes the system task. The name used is dependent upon the system task class. A default name of main is given to both monitor and scheduler system tasks. The default name of an LWS is the name of the host on which the LWS runs.
As administrator, you may set the name of the scheduler and LWS system tasks. This feature can be used to create, for example, a test scheduler.
Each system task regularly queries the NQE database to find any new user tasks it owns.
The scheduler chooses a host on which the user task will run and then changes ownership of the user task to the LWS system task for that host. The LWS system task then submits the task to the LWS daemon on that host. The LWS daemon issues the job request to the local NQS system.
User tasks have a state attribute (t.state). Different states signify different stages in the progress of the task through the system. There is no limit to the number of states a user task may pass through, and the name of a state is arbitrary. However, there are some well-defined states.
A task's state may be changed without changing the owner. Also, the owner of a task may be changed without changing its state.
Figure 9-3 shows some ownership changes and state changes possible for a user task.
A typical path for a user task is as follows:
The user uses the cqsub command to insert the task into the NQE database.
cqsub assigns ownership to the default scheduler system task (scheduler.main) with a state, New.
scheduler.main checks the task's validity and places it in state Pending. The Pending state is an example of an internal state; the scheduler may define many such states. (How long the task is in the Pending state depends upon the scheduler's algorithm.)
When the task is scheduled to run, scheduler.main assigns it to one of the lws system tasks with state Scheduled. There is, in general, one lws system task for each node in the cluster. The LWS system task sends the request to the LWS daemon on its node.
The lws attempts to submit the task as an NQS request to the NQS running locally. If this action succeeds, the state is changed internally to Submitted. Until NQS completes the job, the user task remains in the Submitted state.
Upon job completion, the lws assigns task ownership back to scheduler.main. The state will be one of the following:
Completed, meaning that the request completed normally.
Failed, meaning that the request was not successfully submitted to NQS.
Aborted, meaning that something went wrong with the request while it was assigned to NQS.
Terminated, meaning that the request was deleted because a signal or delete command was sent to the request.
After job completion, scheduler.main examines the task's state and either places the task back in Pending state (signifying that the task is to be rerun) or passes the task to the monitor system task (monitor.main) without changing its state.
monitor.main receives all user tasks that will never run again. It is responsible for removing the task from the NQE database.
Events form the mechanism for communication among users, administrators, user tasks, and system tasks.
Any client program connected to the NQE database may insert and post an event by inserting an object of type e. System tasks regularly check for any new user tasks they own. Once an event has been acted upon, it is marked as acknowledged and may be removed from the NQE database. (The monitor system task can be configured to automatically purge old events from the NQE database.)
Users may only post the signal event (by using either NQE GUI menu selection or the cqdel command) when it is targeted at user tasks they submitted. System tasks and administrators may send events to any object (including system tasks).
To send an event, use the nqedbmgr event post or ask command. For example, to post the event xyz with value 123 to object s23, use the following command; case is ignored for xyz in the example:
event post s23 xyz 123 |
Table 9-1 lists the standard events:
Event name | Usual target | Description |
---|---|---|
DUMP_VARS | System task | Request that a system task dump to its log file all global Tcl variable names and values matching the event value. The value may be a glob pattern to match the desired variable(s), such as *config*, or values. |
SHUTDOWN | System task | Request that the targeted system task shut down. |
SIGDEL | User task | Send a signal to the NQS request represented by the user task. Delete the NQS request if the job request is not running. Assign the user task to the monitor system task. Note that this does not delete the user task from the NQE database. The value is optional and may be a signal number or name. |
TINIT | System task | Reinitialize without shutting down. This event is typically posted to the scheduler to force it to restart. |
TPASS | Scheduler | Request that the scheduler perform a scheduling pass. |
TRACE_LEVEL | System task | Change the tracing level of the system task. |
During installation, one NQE node may have been defined to run the mSQL database server by specifying NQEDB, SCHEDULER, and MONITOR in the NQE componets to be started or stopped by default (set by the NQE_DEFAULT_COMPLIST variable of the nqeinfo(5) file). For more information on the NQE_DEFAULT_COMPLIST variable and starting or stopping NQE components, see Chapter 4, “Starting and Stopping NQE”. If so, ensure that the MSQL_SERVER nqeinfo file variable is set to the name of the local machine. MSQL_SERVER should have the same value across all nodes, and clients in the cluster.
The LWS component should be specified in the NQE_DEFAULT_COMPLIST variable of the nqeinfo file on each NQE node where an NQS server is configured.
The nqeinit script is used to start all NQE components.
The nqedbmgr utility allows finer control over process startup. Table 9-2 describes commands available for this control; these commands are also described on the nqedbmgr(8) man page:
Table 9-2. nqedbmgr Process Startup Control Commands
Command | Description | Example | |
---|---|---|---|
db start | Starts the database server (msqld). This command will attempt to start the mSQL daemon. If the daemon is already running, an appropriate message is displayed. You must be root and be on the NQE database server node to perform this command. | db start | |
db create database | Creates the NQE database. This command creates all the table definitions required for NQE's database. If the tables are already created, an appropriate message is displayed. You must be root and be on the NQE database server node to perform this command. | db create database | |
create global name [tclarrayname] | Creates the NQE global object. This command inserts an object into the database containing global configuration data. This object is read by all processes connected to the database. No action is performed if the object already exists. tclarrayname is an optional name of a Tcl array containing any extra configuration attributes for the object. | create global config | |
create systask name [tclarrayname] | Creates a system task object. This command inserts an object into the database to be used by system tasks. One systask object is required for each system process that is to connect to the database. That is, one is needed for the monitor process, one is needed for the scheduler process, and one is needed for each lws process. | create systask monitor create systask lws | |
| |||
start systask | Starts up the system task. This command starts the named system task process. On the MSQL_SERVER, nqeinit will start a monitor, a scheduler, and an lws process. On other NQE servers, nqeinit will create an lws system task object and start an lws process on the NQE server. | start lws start monitor start scheduler start lws.$host |
The nqeinit startup script uses nqedbmgr Tcl input scripts to perform all the startup operations. See those input scripts for examples of how the individual components are started.
![]() | Note:: If NQS is not running on a system, the corresponding LWS system task will be marked Exited, and no tasks will be scheduled for that node. |
The nqestop script stops all system tasks and shuts down all NQE components.
The nqedbmgr utility allows finer control over process shutdown. Table 9-3 describes commands available for this control:
Table 9-3. nqedbmgr Process Shutdown Control Commands
Command | Description | Example | |
---|---|---|---|
shutdown all | Shuts down all system tasks. This command sends events to all system tasks, requesting that they shut down. | shutdown all | |
db shutdown | Shuts down the database server. This command stops the mSQL database daemon. Any clients connected to the database are immediately disconnected. Any system tasks running when the mSQL daemon is shut down continue to run but enter a Connection retry loop. | db shutdown |
The nqestop shutdown script uses nqedbmgr Tcl input scripts to perform all the shutdown operations. See those iput scripts for examples of how the individual components are stopped.
The nqedbmgr utility provides commands for determining the status of the NQE database and examining data in the database. Table 9-4 describes commands available to obtain status; these commands are also described on the nqedbmgr(8) man page:
Table 9-4. nqedbmgr Status Commands
Command | Description | Example | |||
---|---|---|---|---|---|
status [all|db] | Provides a status of the components of the NQE database. status by itself displays which system tasks are running, their PIDs, and the host on which they are running. status db obtains version information from the mSQL server. status all displays everything. | status | |||
|
| ||||
select options | Reads data from the database and displays it. This command reads (or selects) data from the database and displays it. See the nqedbmgr(8) man page for details. | select -f t |
Authorization is performed at the following points:
Connection: Upon connecting to the NQE database (all client commands and system tasks must connect to the database).
Target: On the LWS machine just before a request is submitted on the target host selected by the NQE scheduler.
Events: When an event is inserted into the NQE database (a delete or signal request sent by using the cqdel command or the NQE GUI inserts an event in order to delete a job).
Status: When reading (selecting) information from the database.
Configuration requirements for each of these authorization points are described in the following sections.
Every client user (including a system task) wishing to connect to the mSQL daemon requires a valid database user entry in a file called nqedbusers; the nqedbusers file resides in the NQE spool directory defined by NQE_SPOOL in the nqeinfo file. When connected, any object (that is, request) inserted into the database is owned by the dbuser. The dbuser name may be a string of any alphanumeric characters and does not necessarily have to be a UNIX user name (dbusers do not need /etc/passwd entries on the machine running the mSQL daemon).
The mSQL daemon reads and uses the contents of the nqedbusers file to determine whether a given client may connect to the database. If, as a result of this check, the client user's connection request is refused, a Connection disallowed message is passed back to the client and the connection is broken.
The nqedbusers file is made up of an optional %host segment followed by multiple %dbuser segments. Blank lines and lines beginning with # are ignored.
The host segment specifies an optional list of client host names explicitly allowed or disallowed access to the database. If the segment is not supplied, or the list of host names is empty, any client host may connect to the database, and the checks move on to the dbuser segments. The syntax in the nqedbusers file is as follows:
%host [clienthostspec clienthostspec ...] |
A clienthostspec is a host name optionally prefixed by + (to allow access explicitly) or -- (to disallow access).
The database user segment defines allowed or disallowed access for a specified database user. The syntax in the nqedbusers file is as follows:
%dbuser dbusernamespec privtype [clienthostspec clientuserspec clienthostspec clientuserspec ...] |
The dbusernamespec is the dbuser name and may be optionally prefixed by -- to explicitly disallow the dbuser. The following formats allow any user to connect to the NQE database:
%dbuser %dbuser + %dbuser + user |
If the dbusernamespec is not supplied, or the list of dbuser names is empty, any user may connect to the database. A dbusernamespec of + alone has a special meaning and corresponds to a dbuser equal to the client user name. This entry allows access for many users using their existing UNIX user names without the necessity of explicitly specifying each name in the file. The %dbuser + user format allows any user to connect to the NQE database; user is the privtype.
System tasks (monitor, scheduler and LWS) connect using a dbuser of root.
privtype specifies the type of connection privilege. Two types are defined, as follows:
user, which allows a request to be submitted and deleted and allows a user to obtain the status of the request by using the cqsub, cqstatl, cqdel, and NQE GUI interfaces.
system, which allows full access to the NQE database.
clientuserspec is a UNIX user name optionally prefixed by + (to allow access explicitly) or - (to disallow access). An entry of simply + allows any client to connect as the given dbusername.
An example of an nqedbusers file follows:
# nqedbusers file # # Allow root access from any host %host # Disallow hosts -badhost # Allow other hosts + %dbuser root system localhost root server1 root server2 root %dbuser fred user client1 shirley # # Allow any user access from any host. # ( db username = client username ) %dbuser + user -badclient bob - amy + + |
In this file, access is completely denied for host badhost, but all other hosts are allowed access. A dbuser of root has been defined with full access privileges (system) for hosts localhost, server1, and server2, which allows the LWS to run from those three hosts. A dbuser of fred has been defined and may be used by shirley from client host client1. All other users are allowed to use their UNIX names to connect as the dbuser name, with the exceptions of bob from badclient (bob can connect from anywhere except from badclient) and amy, who cannot connect from any host.
The administrator may define how the LWS validates jobs from the NQE database before submitting them to NQS locally. Valid values are as follows:
FILE. The LWS checks for a file called ~user/.nqshosts or ~user/.rhosts. Within the file is a list identifying which client users from which client hosts are allowed or disallowed access. The task attributes sec.ins.clienthost and sec.ins.clientuser are used to determine which client hosts allow access by which users. These client host and client user names must be included in the .rhosts file or .nqshosts file. For information about user task attributes, see “User Task Attributes” in Chapter 10.
PASSWD. A valid UNIX password is required.
The initial value when NQE is first started is FILE. After that, you must update the global configuration object in the NQE database and post a TINIT event to the lws to tell it to re-read the configuration. For example:
% select s Connected to coffee as DB user root (conn=3) ID CLASS NAME STATE s2 monitor main Running s3 scheduler main Running s4 lws coffee Running s5 lws latte Running s6 lws cappuchino Running % select -f s4 {run_mechanism} Object s4: run_mechanism = File % update s4 {run_mechanism Passwd} Object s4 updated % select -f s4 {run_mechanism} Object s4: run_mechanism = Passwd % event post s4 TINIT Posted event e16516 |
Event objects are used to communicate between tasks, both system tasks and user tasks. For example, if you change the global configuration, you need to send a TINIT event to the appropriate system tasks in order to tell these tasks to re-read the configuration and get the new values.
Table 10-4, lists the predefined events you may want to send.
A dbuser connected with the privilege type of user may only send events to job tasks created by the same dbuser. Those connected with the privilege type of system may send events to any task, including system tasks.
A dbuser connected with system privilege may read any object in the database by using the nqedbmgr interface. That is, any task may be examined.
A dbuser connected with user privilege is restricted to using the cqsub, cqstatl, cqdel, and NQE GUI interfaces.
The administrator may allow the user to see summary information on all tasks in the NQE database, not just the user's requests and tasks. This is achieved with the global configuration attribute status.access.
![]() | Note:: You do not need to post an event when you issue the following nqedbmgr commands. |
The following nqedbmgr command allows users to see summary information on all requests and tasks (the default is all):
update config {status.access all} |
The following nqedbmgr command allows a user to see only the requests and tasks the user created:
update config {status.access own} |
The NQE database is an mSQL database that grows, but never shrinks. This can degrade the performance of the database. You can use the compactdb(8) script to compact the NQE database (nqedb) to improve its performance.
![]() | Warning: You must shut down the NQE database server before you invoke the compactdb(8) script to avoid corruption of the database or lost data. You can use the nqestop(8) script to stop all system tasks and shut down the NQE database. |
The MAX_SCRIPT_SIZE nqeinfo file variable lets the administrator limit the size of scripts that are submitted through nqedb. If a script contains more lines than the number of lines defined by the MAX_SCRIPT_SIZE variable, it is not allowed. If MAX_SCRIPT_SIZE is 0 or is not set, scripts of unlimited size are allowed.
You can set and/or update various global configuration attributes for the NQE database by using the following nqedbmgr command syntax:
update config_obj {attribname attribvalue...} |
For example:
update g1 {status.access all} |
A system task must be re-initialized or restarted in order to read any updated global configuration attributes.
![]() | Note: If you are not certain which system tasks are affected by your change, it is recommended that you re-initialize all system tasks to ensure all tasks that read a specific variable are notified of the variable's updated attributes.
|
To re-initialize a system task (for example, the scheduler), enter the nqedbmgr command that inserts a TINIT event object into the database; this object is read by the scheduler and acknowledged:
ask scheduler tinit |
The following sections describe the predefined global configuration attributes for the NQE database, NQE scheduler, and LWS attributes that you can use.
Table 9-5 lists the predefined global configuration attributes for the NQE database (this table is also provided on the nqedbmgr(8) man page):
Table 9-5. nqedbmgr Predefined Global NQE Database Configuration Attributes
Attribute name | Description | Default if not specified | |
---|---|---|---|
hb.systask_timeout_period | Time before monitor system task marks a given system task as down. The value may be a mathematical expression. Units are seconds. | 90 | |
hb.update_period | Time between each update of the heartbeat. Each system task will update the database within this period. It is used by the monitor to determine whether a system task is still alive. The value may be a mathematical expression. Units are seconds. | 30 | |
purge.systask_age | Age (elapsed time since last update) of acknowledged events concerning system tasks that the monitor is to delete. The monitor will delete all tasks that are the specified age or older. The value may be a mathematical expression. Units are seconds. | 24*60*60 | |
purge.task_age | Age (elapsed time since last update) of completed user tasks and user events that the monitor is to delete. The value may be a mathematical expression. Units are seconds. | 24*60*60 | |
status.access | Type of access allowed by users when attempting to obtain summary information of jobs in the database. Value can be:
A user may only see full information for tasks submitted by that user. | all | |
sys.ev_rate | Rate at which system tasks check for new events targeted at them or at tasks they own. This attribute may be a mathematical expression. Units are seconds. | 10 | |
sys.purge_rate | Rate at which monitor checks the age of user tasks, user events, and system events. This attribute may be a mathematical expression. Units are seconds. | 5*60 | |
sys.st_rate | Rate at which system tasks check for new tasks, changes in task state, and so on. This attribute may be a mathematical expression. Units are seconds. | 10 | |
sys.upd_rate | Rate at which an LWS system task checks for messages from NQS. This attribute may be a mathematical expression. Units are seconds. | 10 |
The default scheduler file is situated in the /nqebase/bin directory and is called local_sched.tcl. The default scheduler has the following properties:
The number of tasks that may run on an LWS can be limited.
The number of tasks per user that may run on an LWS can be limited.
LWS selection can be performed by a simple round-robin or by using an NLB policy (passing memory and CPU requirements).
Tasks are ordered by time in.
The user may narrow the list of target NQS servers by passing a colon-separated list of hosts as a named request attribute, lws. For example, cqsub -la lws=hosta:hostb narrows the target server list to hosta and hostb.
A simple cluster rerun facility is implemented.
The default scheduler can interpret the following predefined attributes, which are overridden if they are set in the system scheduler object. The following example sets the scheduler passrate attribute, which overrides the default value of 60:
nqedbmgr update scheduler {passrate 70} |
Table 9-6 lists these attributes (this table is also provided on the nqedbmgr(8) man page).
Table 9-6. nqedbmgr Predefined Scheduler Attributes
Attribute name | Description | Default if not specified | |
---|---|---|---|
max_fail_count | Cluster rerun attribute. Maximum job failures allowed (i.e., number of times LWS returns a job to the scheduler without completing it) before the job is no longer requeued. | 5 | |
nlbpolicy | NLB policy to use to determine a destination-selection list for each task. none means use round-robin. | none | |
passrate | Rate, in seconds, at which scheduling passes are made. | 60 | |
s.clrerun | Allow cluster rerun. | No | |
s.schedfile | Name of scheduler file. The file name may be the absolute path name or relative to the NQE bin directory as defined by NQE_BIN in the nqeinfo file. Note: This attribute is always recognized; it defines which scheduler is running. | local_sched.tcl | |
total_lws_max | Limit on number of tasks that may be running on any one LWS. | 5 | |
total_lws_user_max | Limit on number of tasks that any one user may have running on any one LWS. | 2 |
Table 9-7 lists the predefined LWS attributes (this table is also provided on the nqedbmgr(8) man page):
Table 9-7. nqedbmgr Predefined LWS Attributes
Attribute Name | Description | Default if not specified | |
default_queue | Name of the default NQS queue to which jobs are submitted. | nqebatch | |
qmgrnow | If set and true, requests that LWS follow any submission to NQS by the qmgr> scheduler request xyz now, where xyz is the NQS request ID. This will force NQS to run a job, even if local NQS scheduling parameters deny it. | Yes | |
run_mechanism | Defines the validation mechanism used to check the target user against the client user for local submission to NQS. The mechanisms available are:
| File |