Chapter 12. Job Dependency Administration

This chapter discusses administrative tasks that job dependency requires. The following topics are covered:

Managing the NLB Database for Job Dependency

Job dependency event information remains in the NLB database until it is explicitly cleared either by you or the user who issued the cevent(1) command.

In order to keep the size of the NLB database manageable, you should clear unwanted events regularly. To list and clear events, use the cevent command.

User root can list any event from any event group by using the following command:

cevent -la

The following example shows how the output looks when you use the default delimiter:

# cevent -la
Time:                     Group:         Name:          Value:
Mon Dec 11 12:18:15 1995  g1             n1             <NONE>
Mon Dec 11 12:18:18 1995  g2             n1             <NONE>
Mon Dec 11 12:18:21 1995  g4             n1             <NONE>

You can parse and analyze the output from the cevent -la command from within a script to produce a set of events to delete. To do this, it may be useful to redefine the delimiter character. If you use the -d to specify the delimiter !, the output would look like the following example:

# cevent -la -d '!'
Mon Dec 11 12:18:15 1995!g1!n1!<NONE>
Mon Dec 11 12:18:18 1995!g2!n1!<NONE>
Mon Dec 11 12:18:21 1995!g4!n1!<NONE>

To delete all events from all groups, use the following command:

cevent -ca

To delete all events from a group named g1, use the following command:

cevent -c -g g1

To delete the events n1 and n2 from the group g1, use the following command:

cevent -c -g g1 n1 n2

Multiple NLB Servers and Job Dependency

A potential problem exists when you use job dependency and multiple NLB servers. This section describes the problem and methods you can use to prevent it from occurring.

Multiple or redundant NLB servers work like disk mirroring. The NLB collector processes on each node configured to run as a collector send the same system and request status information to all defined NLB servers. If an NLB server goes down and then comes back up, the collectors will soon resend their data to it and make it current.

If you specify more than one NLB server in your nqeinfo file, the cevent command tries to send information to the first server listed. If that server is temporarily down (or unavailable due to network problems), the cevent command tries the second server, and so on. If the first server subsequently comes up, the cevent command will send information to it (rather than the one to which it previously sent information).

For example, assume that your nqeinfo file lists two NLB servers, as follows; quotes are required in the syntax:

# NLB_SERVER - Location of NLB server host
NLB_SERVER="host1 host2"

If a user posts job dependency information on the server that uses this nqeinfo file, the collector on the server tries to send the information to the NLB server database on host host1. If host1 does not respond, the collector tries to send the information to host2. If both fail, cevent returns an error.

If a user posts an event when host1 is unavailable, the event is posted to host2. If host1 becomes available, any cevent commands will use it because it is first on the list in the nqeinfo file. If cevent tries to read an event that was posted to host2, it will actually look for the event on host1. The event will not be read because it is not posted on host1.

To prevent these problems, users can set their NLB_SERVER environment variable to one machine before they invoke cevent. This will force all subsequent cevent invocations to use the specified server. The cevent -w command will wait a specified number of seconds for a response, so the user can wait for transient network, NLB, or system problems on the server to be corrected.

You can use the following methods to avoid the problem:

  • If the NLB server on host host1 or host1 goes down, ensure that the NLB server on host1 stays down until some quiet period. Then bring the NLB server on host2 down. Move the NLB data on host2 to host1 by copying all obj_* files in /usr/spool/nqe/nlbdir on host2 to /usr/spool/nqe/nlbdir on host1. Then bring both NLB servers back up.

  • This method requires a manual procedure or a script. If the NLB server on host host1 or host1 goes down, log in to host2 as root during a quiet period and dump (using nlbconfig -odump) all of the cevent objects for all event groups in the NLB database on host2. Log in to host1 as root and update all events with this data (by using nlbconfig -oupdat).

  • You can rename the cevent script, create a new script in the same directory, and name the new script cevent. The new script then sets the NLB_SERVER environment variable and calls cevent under the new name. This guarantees that cevent will use the NLB server that you specify in the script.