EnlightenDSM is a powerful tool for monitoring host and network problems. The Events feature collects and saves status, configuration, performance and capacity information, automatically sending warning messages to system or network managers when an alarm condition occurs. Events can also take action under specific conditions using a process you specify.
The information collected by Events is also used by the Status Map feature to graphically display the state of your systems. These two powerful tools identify which hosts or pools are having problems, or are about to have problems, allowing network managers to anticipate errors and take quick action when problems arise.
This chapter provides information about:
Monitoring systems with Events
Viewing system status with the Status Map
Monitoring logins, processes, and CPU usage
Auditing security checks
This section describes how to add, modify and delete event tests using EnlightenDSM's Events feature. For a detailed overview of Events capabilities and features, standards compliance, and the basics of building a testtab file, refer to Chapter 10, “Events,” in the EnlightenDSM Reference Manual.
The Events feature helps predict problems by reporting the event and taking corrective action defined by the user.
Data collected by Events can be used to assist in tasks such as:
system tuning
load balancing
resource planning and justification
upgrade requirement analysis
Events collects this data by monitoring the following:
memory subsystems
individual files
directory queues
filesystems
printer queues
critical processes
network statistics
hardware inventory
software inventory
user-provided data
An appropriate message can be sent to network managers, system managers, or both when an alarm condition occurs. Events can send alarms using one or more of the following methods:
SNMP (Simple Network Management Protocol) trap messages
Programmable Events Processor (PEP) messages
Events can also pass the alarm for possible corrective action to a defined process. The same process can be assigned for all tests or a separate process can be specified for each test.
There are two Events categories:
Group Tests—Also referred to as Internal tests, these tests are automatically configured with default configuration values. Refer to “Modifying and Existing Test” later in this chapter for more details. Group Tests include:
O/S Tests
File Systems Tests
Printer Tests
Process Tests
Inventory Tests
RPC Tests
VM (Virtual Memory) Tests
MBUF Tests
Ncashe Tests
IP Tests
ICMP Tests
TCP Tests
UDP
Item Tests—Tests configured by the administrator using the Events Configuration Add feature. Refer to “Adding an Event Test” in this chapter for more details. Item tests include:
Process Tests:
Instance of Process
Size of Process
Time Used by Process
API built-in tests
File Tests:
File Size
File accessed
File modified
File clamped
To create a new test:
Choose Configure from the Events menu. The Events Configuration window appears.
This window displays the hostname, the test group, whether the test is turned on or off, whether logging for the test is turned on or off, the severity level, and the test name for each test EnlightenDSM finds on your default host.
Click the Add button to select the test type for this new event test. A Select New Event Type window appears. Select the type of test to be created.
Enter the hostname that will contain the default settings for the new tests.
From the Test Category field, choose the test type. The options are:
Files (the default)
Processes
Directories
From the Sub-category field, specify a subcategory of the test types. If the Files option in the Test Category field was chosen, the type of file test can also be selected:
Size of File (the default)
Last Modified
Last Accessed
File Clamping
If the Processes option in the Test Category field was chosen, the type of process test can also be chosen:
Instance of Process (the default)
Size of Process
Time Used by Process
If the Directories option in the Test Category field was chosen, there are no further subcategories to choose from.
Click the Apply button to display the Add Events Test window for the selected test type. Enter the parameters for the new event test.
For more information on the formats of these fields, or to run Events from a command-line mode, refer to Chapter 10, “Events,” in the EnlightenDSM Reference Manual.
This section describes how to use each field and button in the Add Events Test window.
Type the hostname(s) for this test (or click the right arrow button to choose from a pick list of all hosts within the current pool). Leave a blank space between hostnames for multiple entries.
Enter the name of the test. This must be either the process name, or the full pathname of the file or directory to be monitored.
An optional list of command arguments is used in matching a process name. This field can be used to differentiate between two process instances by also matching the argument list used by each of the processes.
This read-only field shows what the standard units of measure are for this test.
This read-only field shows the Events-defined subcommand (if any) this test will use during its execution.
This read-only field determines if this test is an Events built-in test (Yes setting) or a user-defined test (No setting).
This toggle chooses the level of severity to assign this test from the following message types:
OK
Informational (the default)
Warning
Error
Severe
Choose whether this test should use PEP to report its results and/or filter any action to be taken. The default setting is Yes.
If Logging is enabled, enter a “changed by” (delta) value. EnlightenDSM will record the most recent value measured by the test if that value differs by at least this delta amount from the previously logged value.
See Chapter 10, “Events,” in the EnlightenDSM Reference Manual for more details.
Enter the mail program that should be used to deliver alarm messages. The default is /bin/mail. If another mail program is used, it must use the same syntax as the standard mail program for the target operating system.
Enter the user(s) who should receive any alarm information. The default is value is root. Leave a blank space between each user name for multiple entries. If this value is set to nobody, no mail will be sent.
Enter any executable this test should run when it sets an alarm. This can be a script or a compiled executable.
Enter how much time (in minutes) must elapse before a file in a directory is considered to have “aged”. Only files more than “aged” minutes old are counted as “old”.
This value is only available if this test will monitor a directory queue.
Specify how long to wait in minutes before sending another new alarm about this test. The default is every hour.
Specify an absolute high-level alarm set point for the data you're measuring in this test. This can be an integer or floating-point value.
Specify an absolute low-level alarm set point for the data you're measuring in this test. This can be an integer or floating-point value.
Specify a positive percentage change alarm set point for the data you're measuring in this test. This threshold checks the percentage of change by comparing the current test value with the last measured value. This must be a floating-point value.
Specify a negative percentage change alarm set point for the data you're measuring in this test. This threshold checks the percentage of change by comparing the current test value with the last measured value. This must be a floating-point value.
Specify a positive incremental change alarm set point for the data you're measuring in this test. This threshold checks for the change of n points by comparing the current test value with the last measured value. This can be an integer or floating-point value.
Specify a negative incremental change alarm set point for the data you're measuring in this test. This threshold checks for the change of n points by comparing the current test value with the last measured value. This can be an integer or floating-point value.
If you're creating an API test, specify the full pathname of the file that will hold the values you are monitoring.
![]() | Note: See Chapter 10, “Events,” in the EnlightenDSM Reference Manual for examples and more information on creating API tests. |
If an API test is being created, specify which field or column holds the data value to monitor. Use a digit prefaced by an `f' for a field number or by a `c' for a column number. The default assumes this value is a field number. If you're using a column designator, each character in any input file line/row is handled as one column.
If an API test is being created, specify which field in your file contains a descriptive word or label.
Regular expressions can be used to define “types” of messages based on pattern matching. When one or more of these message types are found in a file, an alarm is sent to the agents specified in the test. Each time this test runs, it evaluates only those files added since the last occurrence of the test.
Click the Apply button to add the test configuration to the testtab files for all specified hosts. The Events process monitoring the data will also be updated.
![]() | Note: EnlightenDSM is updated immediately and the testtab file is updated two minutes later. |
Click the Modify button in the Events Configuration window (Figure 5-1) to modify an existing test configuration. A window similar to the Add Events Test window will appear.
See the previous section, “Entering Test Parameters” for information about each field.
There are three differences between the Add and Modify Events Test windows:
The Test Name field in the Modify window is read-only.
The Modify button (rather than the Add button) is used to save changes.
The Next button is used to modify additional test configurations if more than one test is selected for modification from the Events Configuration list. Clicking next will not save your changes so be sure to click Modify before moving on to the next test configuration.
Highlight the test to be deleted from the Events Configuration window. Click the Delete button. EnlightenDSM prompts you to confirm your action.
![]() | Note: Only tests you have added can be deleted. Built-in tests can be turned off, but cannot be deleted from the Events Configuration list. |
To create a new test using the parameters of an existing test, highlight the existing test and click Copy. The Add Events Test window will appear containing the original test parameters in each field. Edit this window as needed and click Apply to save the new test.
See “Entering Test Parameters” for information about each field.
![]() | Note: You can only copy tests you have added. |
The Status Map uses information provided by Events to graphically display the current state of hosts and pools using color-coded icons. You can ignore the status, or query events and fix the problem.
This section describes how to interpret and navigate the Status Map, query events using the Status Map window, and clear the Status Map of Events.
To view the Status Map, choose Status Map from the Events menu.
A map similar to the one in Figure 5-5 appears.
The state of each host or pool icon is displayed according to the icon color:
Green - OK
White - Informational
Yellow - Warning
Blue - Error
Red - Severe
The color of the host icon reflects the highest priority event that has not been cleared. The color of pool icons reflect the highest priority uncleared event for any host within that pool:
A blinking host or pool icon indicates the following:
If a host icon is blinking, an unacknowledged event has occurred for that host.
If a pool icon is blinking, an unacknowledged event has occurred for at least one host contained in the pool. An unacknowledged event is any event message that you have not yet acknowledged using the Status Map.
![]() | Note: If you haven't recently viewed the Status Map and an icon is blinking green, look at the preceding activity for that host or pool. An alarm might have occurred and cleared itself between viewings. For example, if an unauthorized user logged on to your system and then exited, and an Events test had already checked for this type of alert, a green alert would appear. All EnlightenDSM system administration functions act upon any hosts selected in the Status Map. |
To change the Status Map background, place the cursor arrow over the map and click the right mouse button. A list of backgrounds appears from which to choose.
To navigate the Status Map:
If pool icon is selected, all hosts within that pool are selected and can be managed as a single unit.
To perform administration functions on a subset of hosts in a pool, select those hosts by single-clicking on the host icons.
The standard mouse “select rectangle” or “sweep” methods can also be used to define a temporary group.
To search and view Events alarms and messages logged:
Click the Query Events button in the Status Map window. The Status Map Query Events window appears.
In the Query Hosts field, choose the hosts to be searched. The options are:
Hosts In Exception Pool (the default)
Hosts In Current Pool
Specific Host(s). Specify which host(s) to check for messages in the text field to the right. Leave a blank space between hostnames for multiple entries.
Under Message Type, the options are:
Event Messages Only: alarm messages generated when the results of a test violate a predefined threshold.
Log Messages Only: informational messages logged when a test successfully runs without generating an alarm.
Both Messages (the default).
Under Message Severity Level, the options are:
Severe Message
Info Message
Error Message
Okay Message
Warning Message
Select one or more severity levels to use in the search. The default setting selects all levels of message severity.
Use the Test Name Filter field to limit the search to a few tests. Type the entire test name or just the first few letters of the test name in this field.
All tests names containing all or part of the specified string will be queried (or click the right arrow button to choose from a pick list of all pre-defined standard Events tests). Leave a blank space between test names for multiple entries. The standard regular expression wildcards `*', `[]', and `?' can also be used in this field (for example, /home/*).
In the Number of Messages per Host field, enter the number of messages per host to be searched. The most recent messages are displayed first. The counter buttons to the right also change the number displayed.
Use the Time Between fields to limit the search for message to those generated between a span of time. Enter the beginning and ending dates and times of messages to be searched. For a detailed description of date/time formats allowed in this field, refer to Appendix C of the EnlightenDSM Reference Manual.
Click the Execute Query button to begin the search process. The results are displayed in the View Event Message window (Figure 5-7).
All messages matching your search criteria are displayed in the list box. Each line includes the hostname, test name, logged value, units, severity, status, and timestamp.
To modify a test, highlight the desired test and click the Reconfigure Test button. The Modify Events Test window for that test will appear. For details on using this window to modify a test, see “Modifying an Existing Test”.
An event can be cleared from the Status Map when the condition that triggered the event no longer exists. There are two ways to clear an event:
The event clears itself (for example, an Events CPU load test returns an OK result).
You correct the activity that caused the original event/problem to occur (for example, by correcting whatever is overloading the CPU load).
After all events for a host or pool are cleared, the color for that icon is set to green (current status = OK).
EnlightenDSM provides a host of security features to check vital files, filesystem devices, boot and shutdown scripts, crontab contents, password integrity, group files, home directories, and break-in attempts. The findings from these security checks are placed in a logfile, allowing you to choose the functions that you want to audit.
For information on each one of the security features, refer to Chapter 6, “Security,” in the EnlightenDSM Reference Manual.
The Activity Monitor in the User menu is used to easily monitor login activity, process statuses, and CPU usage.
The rest of this section details how to use each Activity Monitor option.
To quickly see which users are currently logged into the system, go to the User menu, choose Activity Monitor and then Who Is Logged In. The Who Is Logged In window appears (Figure 5-8).
Each line in the window displays the hostname, user name, tty, login time, idle time, process ID, and the location of the tty (if available). To select one or more user accounts for further information, highlight the usernames and select one of the following options:
Click the Write button to write a new message directly to the selected user's mailbox. A window appears with a field for composing a message of any length. Press the Return key to send the message. The recipient can respond to this message, allowing for a two-way conversation.
The Message command sends a predefined or custom form letter directly to the user's screen, instead of the user's mailbox. The recipient cannot reply to this message.
Click the Logout button to immediately terminate all highlighted work sessions. This command kills the initial Shell process belonging to the marked users.
![]() | Note: Use this command with caution as it may also cause related user processes to be killed. |
Click the Processes button to display a window of all processes currently running for the highlighted users. To further manipulate this information, see the next section, “Monitoring Process Status.”
To view a list of all active processes, choose Activity Monitor and then Process Status from the User menu. The Processes window appears (Figure 5-9).
Highlight a process and use the command buttons in the window to perform the following actions.
Click the Terminate button to immediately kill the highlighted process. A pop-up window will prompt you to verify your Terminate command.
![]() | Note: This command will not kill related processes, so if there are child processes running, they will become orphans. These orphans might terminate automatically, or you may have to kill them manually. |
The Hangup command is similar to the Terminate command, except it provides enough time for the process to shut down properly. This means the process can close any files and terminate any child processes. A pop-up window will prompt you to verify your Hangup command.
Clicking the Suspend button will stop a process from working but will not terminate it. The process is put on hold and can be re-activated later. Click Continue to re-activate a suspended process.
Click the Priority button to change the priority of a process. This priority determines when the CPU acts on a process. It may have a value from -20 to +20; the smaller the number, the higher the priority. You can enter the desired priority or use the arrow buttons to change the value.
From the User menu, choose Activity Monitor, then CPU Summary, The Summary of Process By User window appears (Figure 5-10). This window shows all currently logged-in users, the current number of processes, and the total cumulative CPU usage for each active user
To graph the processes, highlight the users to be graphed and click the Graph button. A window appears displaying the highlighted items in a graphical format. Press and hold down the middle mouse button to rotate the graph in the direction you move the mouse.
To view the processes in detail, highlight the users to be viewed and click the Processes button. A window appears displaying all processes for the highlighted users. To further manipulate this information, see “Monitoring Process Status”.