Chapter 4. Monitoring Arrays and Displaying System Statistics

This chapter describes how to display information windows that provide status and statistics information about an array, SP, LUN, disk module, or caching. It also explains the role of polling and statistics logging and describes how to turn on and off the statistics log that supplies the information for these windows and how to display and save the event log for an SP.

This chapter consists of the following sections:

Using ssmagent Polling

Before you use the RAID GUI to monitor the health of managed arrays, you need to make sure that it has up-to-date information on them. ssmgui updates its information for each array by polling the ssmagent on the managed servers for changes in the state of the disks, SPs, LUNs, and read and write cache.

You can automatically poll one or more arrays at specified intervals that can you can set for each array, or you can manually poll selected arrays whenever you want. Automatic and manual polling are completely independent procedures; that is, one does not affect the other.

This section contains the following topics:

ssmagent Polling Interval and Polling Requests

The ssmagent has a polling interval that you can specify in its configuration file (see its man page, ssmagent(7). This polling interval is the minimum interval at which the ssmagent polls an array on the server on which the ssmagent is running. Each time ssmagent polls an array, it updates its information for the array.

Regardless of how frequently a client application, such as ssmgui, polls the agent for information about an array, the ssmagent actually polls the array only in response to the first client poll of the array after the polling interval has elapsed. As a result, the ssmagent polling interval can prevent client applications from overwhelming the ssmagent with excessive poll requests.

Because the ssmagent does not always poll an array every time a client application requests a poll, the client application represents only the information that the ssmagent currently has for the array. If the information for an array has changed, but the polling interval did not elapse at the time of the poll request, the ssmagent does not poll the array. As a result, it cannot notify the client application of the change.

For example, suppose the ssmagent polling interval is 60 seconds. If a client application sends the ssmagent a request to poll an array at 6:00:00. The ssmagent polls the array, and notifies the client application of any change in the array. In this situation, the client application reflects the current state of the array after the poll request. The ssmagent does not poll the array again until at least 6:01:00. If a client application requests a poll of the array between 6:00:00 and 6:01:00, the application reflects only the state of the array at 6:00:00. If a disk module in the array fails at 6:00:25, no client applications that request a poll of the array between 6:00:25 and 6:01:00 are notified of the disk module failure. In response to the first client application to poll the array after 6:00:10, the ssmagent polls the array, updates its information on the array, and notifies the request from a client application of the disk module failure.

Automatic Polling

You can automatically poll an array at a specified intervals, or you can manually poll selected arrays whenever you want. Automatic and manual polling are completely independent and do not affect each other.

By default, automatic polling is disabled for the ssmgui session and for each array because the polling process uses array resources, thus affecting performance. You enable automatic polling for a session by specifying an automatic polling interval, and for an array, by specifying an automatic polling priority for the array.

When automatic polling is enabled for the ssmgui session and a selected array, the frequency at which the ssmagent automatically polls the array equals the automatic polling interval multiplied by the automatic polling priority for the array. For example, if the automatic polling interval is 5 minutes and the automatic polling priority for the array is 3, ssmgui polls ssmagent for array information every 15 minutes.

Disabling and Enabling Automatic Polling

Follow these instructions to enable or disable polling:

  • To enable automatic polling for an ssmgui session, choose Automatic Polling Interval from the Options menu of the Storage System Manager window. Choose Enable from the submenu that opens.

  • To enable automatic polling for one or more arrays, select the arrays in the Storage System Manager window. Choose Automatic Polling Interval from the Options menu of the Storage System Manager window. Choose Enable from the submenu that opens.

  • To disable automatic polling for a session, choose Automatic Polling Interval from the Options menu of the Storage System Manager window. Choose Disable from the submenu that opens.

  • To disable automatic polling for one or more arrays, select the arrays in the Storage System Manager window. Choose Automatic Polling Interval from the Options menu of the Storage System Manager window. Choose Disable from the submenu that opens.

Changing the Polling Interval

Follow these instructions to change the polling interval:

  • To change the automatic polling interval, choose Automatic Polling Interval from the Options menu of the Storage System Manager window. In the submenu that opens, select the desired number of seconds or minutes, or choose Other to specify a number of minutes that is not listed.

  • To specify the automatic polling priority for one or more arrays, select the arrays whose priorities you want to change. In the Storage System Manager window, choose Automatic Polling Interval from the Options menu; select the desired priority from the submenu that opens.

Because ssmgui gets changes in the status of any array from ssmagent running on the array's server, ssmgui cannot get this information any faster than ssmagent. Thus, it is useless to set the polling interval for ssmgui to less than that for ssmagent. The default polling interval for ssmagent is 60 seconds.

Using Statistics Logging

Many information windows explained in this chapter display statistics information. Each SP maintains a log of statistics for the LUNs, disk modules, and array caching. Normally, statistics information is not logged, since logging it may affect performance. For meaningful statistics information, you must enable statistics logging as explained in this chapter.

The statistics log uses a 32-bit counter to maintain the statistics numbers. When the counter is full, the statistics numbers restarts at zero. As a result, you see a sudden decrease in a statistics number if you view it shortly before the counter is full and shortly after the counter restarts at zero. If you want to keep the log turned on for more than two weeks, reset the log about every two weeks, so you know when the numbers start at zero.

Follow these steps to turn statistics logging on or off:

  1. In the Storage Management window, select the arrays whose logging state you might want to change.

  2. Click the configure button near the left end of the array toolbar, or choose Configure from the Array menu.

    An Array Configuration window opens for each array selected.

  3. In the Array Configuration window, choose Statistics Logging State from the Array menu.

  4. Select the SP whose statistics logging state you want to change.

  5. In the confirmation window that opens, click Yes to change the setting or No to leave it unchanged.

Getting LUN and Disk Information in the Array Configuration Window

The Array Configuration window shows the LUNs in an array and their ownership: each LUN appears in the field of the SP that owns it, as shown in Figure 4-1.

You can display the location of disks in an array enclosure. Click the LUN to select it; a colored outline surrounds the LUN. In the Disk Field below the LUN selection area, the disks in the LUN are highlighted in the same color in the enclosure in which they are located. Each LUN is outlined in a different color in the display (seeFigure 4-1).

Figure 4-1. Array Configuration Window: LUN and Disk Location Information


This section contains the following topics:

LUN Icons and IDs

LUN icons indicate the LUN type, as summarized in Table 4-1.

Table 4-1. LUN Icons and Types

LUN Icon

LUN Type


RAID 5


RAID 3


RAID 1


RAID 1/0


RAID 0


Individual disk


Hot spare


Faulted LUN

LUN IDs are two-digit numbers. To display them, choose Show LUN IDs from the View menu.

Disk Field

The disk field near the bottom of the Array Configuration window shows the disk modules in the array enclosures. It can also show LUN ownership of the disks, and disk module IDs. A letter E in the disk slot indicates an empty slot.

LUN Ownership of Disk Modules

To determine the location of disk modules in a LUN, click the LUN to select it; a colored outline surrounds the LUN.

In the disk field, the disks in the LUN are highlighted in the same color in the enclosure in which they are located. Each LUN is outlined in a different color in the display; Figure 4-2 shows an example.

Figure 4-2. Array Configuration Window: LUN Ownership of Disk Modules


Disk Module IDs

To display disk module IDs, in the Array Configuration window, choose Show Disk IDs from the View menu. Figure 4-3 shows an example.

Figure 4-3. Array Configuration Window: Disk Module IDs


The disk module ID is a two-digit decimal number: the first digit is the enclosure number multiplied by 10, and the second digit is the number of the slot containing the disk module. For example, the disk module ID for the disk module in slot 3 of enclosure 2 is 23.

LUN and Disk Module Icon Color

LUN and disk module icon color indicates its status:

Gray indicates that the LUN or disk is operating normally.

  • Blue indicates a transition state, such as being bound or incorporating a hot spare (LUNs) or being bound or powered on (disk modules). A small letter T in the icon also indicates a transitional state.

  • Amber indicates a fault condition, such as a failed array component or a breakdown in communication (LUN) or a failure (disk module). A small letter F in the icon also indicates a fault condition.

To get more information on arrays, LUNs, disks, and SPs, you can launch other windows, as explained in the rest of this chapter.

Using the Array Information Window

The Array Information window displays information on the selected array; Figure 4-4 shows an example.

Figure 4-4. Array Information Window


To display the Array Information window, follow these steps:

  1. In the Storage System Manager window, select the arrays whose status you want to view.

  2. Click the array information button near the middle of the array toolbar, or choose Information from the Array menu.

Use the fields in the Array Information window as follows:

  • State: Faulted or Normal.

  • Automatic Polling Priority: When automatic polling is enabled, the automatic polling priority and polling interval determine how often the software automatically polls the array. If the automatic polling priority is 1, ssmgui polls the selected array each time the polling interval elapses. If the priority is 2, ssmgui polls the selected arrays after two intervals elapse; if the priority is 3, ssmgui polls the selected array after three intervals elapse; and so on. The automatic polling interval has no effect on manual polling.

    When ssmgui polls the selected arrays for status, it retrieves information about array status changes, and updates the status information for the polled arrays in any open window. An automatic poll of an array functions the same way as a manual poll of the array.

  • Automatic Polling Count: Number of priority cycles that elapsed since this array was polled. For example, if the priority is 3, a count of 2 indicates that 1 priority cycle has elapsed; on the next (third) cycle the array will be automatically polled.

  • Automatic Disk Formatting: Indicates whether the array firmware automatically formats any disk module that has an unrecognizable format. This setting cannot be changed.

  • Disk Write Caching, RAID3 Write Buffering: Indicates whether these are disabled or enabled.

  • Host Connection: The hostname followed by the operating system device entry and the SP that communicates with ssmgui. The device entry is from the ssmagent configuration file on the array's server. The SP is the default owner of the LUN. With two SPs, two host connections are required to provide information on each SP.

Using the LUN Information Windows

The LUN Information window displays either configuration, statistics, or cache information. When you open the window, it displays LUN configuration information, as shown in Figure 4-5.

Figure 4-5. LUN Information Window: Configuration Information


To display the LUN Information window, follow these steps:

  1. In the Storage System Manager window, select the arrays whose LUN status you want to view.

  2. Click the configure button near the left end of the array toolbar, or choose Configuration from the Array menu. An Array Configuration window opens for each selected array.

  3. In the Array Configuration window, double-click the LUN you want to examine. Alternatively, select multiple LUNs and choose LUN > Display Information > Configuration. The LUN Information window opens, displaying LUN configuration information, as shown in Figure 4-5.

This section contains the following topics:

LUN Configuration Information

To display LUN configuration information, click the leftmost radio button in the View area of the LUN Information window, if necessary. This information is displayed when you first open the LUN Information window; see Figure 4-5 for an example.

The fields have the following meanings:

  • RAID Type: Indicates the LUN `s RAID type, as summarized in Table 4-2.

    Table 4-2. LUN Configuration Information: RAID Types

    RAID Type

    Meaning

    RAID 5

    Individual access array

    RAID 3

    Parallel access array

    RAID 1

    Mirrored pair

    RAID 1/0

    Mirrored RAID 0 group

    RAID 0

    Nonredundant individual access array

    DISK

    Individual disk module

    HOT SPARE

    Hot spare

    RAID type is set when the LUN is configured. To change it, see “Changing LUN RAID Type” in Chapter 3.

  • Element Size: Number of disk sectors (stripe element size) that the array can read or write to a single disk module without requiring access to another disk module (assuming that the transfer starts at the first sector in the stripe). The stripe element size can affect the performance of a RAID 5 or RAID 1/0 LUN. A RAID 3 LUN has a fixed stripe element size of one sector. The smaller the stripe element size, the more efficient the distribution of data read or written. However, if the stripe size is too small for a single I/O operation, the operation requires access to two stripes, which causes the hardware to read and/or write from two disk modules instead of one.

    This number is set when the LUN is configured.

  • Rebuild Time: Time that the array allots to reconstruct the data on either a hot spare or a new disk module that replaces a failed disk module in a LUN. The rebuild time applies to all RAID LUNs except RAID 0. The time you specify determines the amount of resource the SP devotes to rebuilding instead of to normal I/O activity.

    This number is set when the LUN is configured; to change it, see “Changing LUN Bind Parameters That Do Not Require Unbinding” in Chapter 3.

  • Verify Time: Time that the array allots to checking parity. If an SP detects parity inconsistencies, it starts a background process to check all the parity sectors in the LUN. The time you specify determines the amount of resource the SP devotes to verifying instead of to normal I/O activity.

    This number is set when the LUN is configured. To change it, see “Changing LUN Bind Parameters That Do Not Require Unbinding” in Chapter 3.

  • Default SP: The SP that owned the LUN when the array was powered on. To change the default SP, see “Changing LUN Bind Parameters That Do Not Require Unbinding” in Chapter 3.

  • Auto Assignment State: Enabled or Disabled; the original default is Disabled. Auto assign controls the ownership of the LUN when one SP fails in a array with two SPs. With auto assign enabled, if the SP that owns a LUN fails and the server tries to access that LUN through the second SP, the second SP assumes ownership of the LUN so the access can occur. The second SP continues to own the LUN until the SP's power is turned off and on again, at which point ownership of each LUN returns to its default SP. If auto assign is disabled in an array with two SPs, the other SP does not assume ownership of the LUN, so the access to the LUN does not occur.

    Auto assign is set when the LUN is configured. To change it, see “Changing LUN Bind Parameters That Do Not Require Unbinding” in Chapter 3.

  • Minimal Latency Reads State: Disabled or Enabled; applies to RAID 3 LUNs only. For information, see page 48 in Chapter 2. To change the state, see “Changing LUN Bind Parameters That Do Not Require Unbinding” in Chapter 3.

  • Disks: The disk modules that make up the LUN. For example, 00 - 04 means disk modules in slots 0 through 4 in enclosure 0.

  • Percent Rebuilt: If nonzero, this number indicates that a disk in the LUN is being rebuilt; the number shows the percentage of the rebuild completed.

  • Percent Bound: If nonzero, this number indicates that a disk in the LUN is being bound; the number shows the percentage of the bind completed.

LUN Statistics Information

To display LUN statistics information, click its radio button in the View area of the LUN Information window. Figure 4-6 shows an example.

Figure 4-6. LUN Information Window: Statistics Information



Note: Statistics logging must be enabled for entries in this window to be viewed.

The fields have the following meanings:

  • Number of Reads, Number of Writes: Total number of read and write requests made to the LUN.

  • Number of Blocks Read, Number of Blocks Written: Total number of data blocks read from and written to the LUN.

  • Number of Stripe Crossings: Number of times that a read or write crosses a stripe boundary on any disk module in a RAID 5, RAID 1/0, or RAID 0 LUN. Generally, stripe crossings are undesirable because each one requires an additional I/O. The ideal stripe element size is the smallest size that does not cause an additional I/O to another disk module. From the number of crossings, you can determine the percentage of I/Os that required a stripe boundary crossing: add the number of reads to the number of writes, and divide the number of crossings by this sum. A relatively low percentage indicates a relatively efficient stripe element size.

LUN Cache Information

To display LUN cache information, click its radio button in the View area of the LUN Information window. Figure 4-7 shows an example.

Figure 4-7. LUN Information Window: Cache Information


The fields have the following meanings:

  • Read Cache State, Write Cache State: Enabled or Disabled for this LUN. For caching to occur with a LUN, it must be enabled for the array and for the LUN. For information on how to enable an SP's read cache, see “Disabling or Enabling Array Write Caching” in Chapter 3. For information on how to enable an array's read cache, see “Enabling or Disabling Array Caching” in Chapter 3.

  • Read Cache Hit Ratio, Write Cache Hit Ratio: The percentage of read cache and write cache hits for the LUN:

    • A read cache hit occurs when the SP finds a sought page in read cache memory, and thus does not need to read the page from disk.

    • A write cache hit occurs when the SP finds and modifies data in cache memory, which usually saves a write operation. For example, with a RAID 5 LUN, a write hit eliminates the need to read, modify, and write the data.

    High hit ratios are desirable because each hit indicates at least one disk access that was not needed. You may want to compare the read and write hit ratios for the LUN with the read and write hit ratio for the entire array in an SP Cache window (see “SP Cache Information”).

    For a LUN to have the best performance, the hit ratios should be higher than those for the array. A very low read or write hit ratio for a busy LUN may mean that caching is not helping the LUN's performance.

  • Number of Blocks Prefetched: Number of disk blocks (512 bytes each) prefetched for the LUN. This entry and the next indicate how well (how often) prefetching is working for this LUN.

  • Number of Unused Prefetched Blocks: Number of disk blocks that were prefetched but not used. A higher number might mean that prefetching is not an efficient choice for this LUN.

  • Number of Forced Flushes: Number of times the write cache was flushed because the cache filled to its high water mark. A high number is not desirable because a forced flush suspends all other I/O.

Using the Disk Information Windows

The Disk Information window displays either configuration, statistics, or error information. When you open the window, it displays disk module configuration information, as shown in Figure 4-8.

Figure 4-8. Disk Information Window: Configuration Information


To display the Disk Information window, follow these steps:

  1. In the Storage System Manager window, select the arrays whose disk module status you want to view.

  2. Click the configure button near the left end of the array toolbar, or choose Configuration from the Array menu. An Array Configuration window opens for each selected array.

  3. In the disk field near the bottom of the window, double-click the disk module you want to examine. The Disk Information window opens, displaying disk module configuration information, as shown in Figure 4-8.

This section contains the following topics:

Disk Module Configuration Information

To display disk module configuration information, click the leftmost radio button in the View area of the Disk Information window, if necessary. This information is displayed when you first open the Disk Information window; see Figure 4-8 for an example.

The fields have the following meanings:

  • Capacity: Storage capacity of the disk module in MB.

  • State: Operational state of the disk module, as summarized in Table 4-3.

    Table 4-3. Disk Module Configuration Information: State

    Disk Module State

    Meaning

    Binding

    Being bound into a LUN.

    Enabled

    Either a hot spare on standby or part of a bound LUN that is assigned to (owned by) the SP you are using as the communication channel to the enclosure. If the array has another SP, this module's status is Ready when you use the other SP as the communication channel to the array.

    Equalizing

    Data from a hot spare is being copied onto a replacement disk module.

    Failed

    Powered off or inaccessible.

    Not Present

    No disk module is in the slot.

    Off

    Powered off by the SP, which can happen if a disk module of the wrong capacity is inserted.

    Powering Up

    Power is being applied to the disk module.

    Ready

    Disk module is part of a broken LUN or a LUN that is bound and unassigned. This state can mean that the disk module is part of a LUN that is not owned by the SP that you are using as the communication channel to the enclosure. If the disk module is part of a LUN assigned to an SP other than the one you are using as the communication enclosure, the module's status is either Enabled or Ready. It is Enabled when you use the other SP as the communication channel to the array.

    Rebuilding

    Disk module is either a hot spare or a replacement disk module to replace a failed module in a LUN. The data is being rebuilt on a hot spare or a replacement disk module.

    Removed

    Removed from the enclosure; applies only to a disk module that is part of a LUN.

    Standby

    Disk module is either a hot spare or a replacement disk module to replace a failed module in a LUN. The data is being rebuilt on a hot spare or a replacement disk module.

    Synchronizing

    The write cache on the disk module is being synchronized (during a rebuild).

    Unbound

    Ready to be bound into a LUN.


  • LUN ID: Hexadecimal number identifying the LUN to which the disk module is bound. This number is specified when the LUN is bound; see page 53 in Chapter 2 for more information on the LUN ID.

  • LUN Type: Indicates the LUN `s RAID type, as summarized in Table 4-2 on page 95.

  • User Sectors: Number of user-accessible sectors (512-byte blocks) on the disk module.

Disk Module Statistics Information

To display disk module statistics information, click its radio button in the View area of the Disk Information window. Figure 4-9 shows an example.

Figure 4-9. Disk Information Window: Statistics Information



Note: Statistics logging must be enabled for entries in this window to be viewed.

The fields have the following meanings:

  • Average Disk Request Service Time: Average time in milliseconds that the disk module required to execute an I/O request after the request reached the top of the queue.

  • Number of Reads, Number of Writes: Total number of read and write requests made to the disk module. The LUN read or write information displayed in the LUN Statistics window (“LUN Statistics Information”) can be more useful because it is for the entire LUN, and not just for one of the disk modules in the LUN.

  • Number of Blocks Read, Number of Blocks Written: Number of data blocks read from and written to the disk module.

  • Number of Read Retries, Number of Write Retries: Number of times read and write requests to the disk module were retried because of soft or hard errors. See “Disk Module Error Information.”

Disk Module Error Information

To display disk module error information, click its radio button in the View area of the Disk Information window. Figure 4-10 shows an example.

Figure 4-10. Disk Information Window: Error Information


The fields have the following meanings:

  • Number of Hard Read Errors, Number of Hard Write Errors: Number of read or write errors for all the disk modules in the LUN that persisted through all the retries. An increasing number of hard errors can mean that one or more of the LUN's disk modules is nearing the end of its useful life.

  • Number of Soft Read Errors, Number of Soft Write Errors: Number of read or write errors for all the disk modules in the LUN that disappeared before all the retries. An increasing number of soft errors can indicate that one of the LUN's disk modules is nearing the end of its useful life.

  • Remapped Sectors: Number of disk sectors on all the disk modules in the LUN that were faulty when written to, and thus were remapped to different parts of the disk modules.

Using the SP Information Windows

The SP Information window displays either configuration, statistics, or cache information. When you open the window, it displays SP configuration information, as shown in Figure 4-11.

Figure 4-11. SP Information Window: Configuration Information


To display the SP Information window, follow these steps:

  1. In the Storage System Manager window, select the arrays whose SP status you want to view.

  2. Click the Configure button near the left of the array toolbar, or choose Configuration from the Array menu. An Array Configuration window opens for each selected array.

  3. In the Array Configuration window, click the SP A or SP B information button in the array toolbar. Alternatively, choose SP Information from the Array menu, and select the SP for which you want information. The SP Information window opens and displays SP configuration information, as shown in Figure 4-11.

This section contains the following topics:

SP Configuration Information

To display SP configuration information, click the top radio button in the View area (upper right) of the SP Information window, if necessary. This information is displayed when you first open the SP Information window; see Figure 4-11 for an example.

The fields have the following meanings:

  • State: summarizes possible SP states as summarized in Table 4-4.

    Table 4-4. SP Configuration Information: State

    SP State

    Meaning

    Present

    This SP is the communication channel you are using to communicate with the array.

    Not Connected

    The agent cannot talk to the SP because a communication channel specifying the SP is not in the ssmagent's configuration file for the selected host. For example, the SP is connected to a different host from that for the SP in the communications channel for the array.

    Not Present

    The SP was not present in the enclosure when the ssmagent started.

    SP Removed

    The SP was present in the enclosure when the ssmagent started and has since been removed.


  • Firmware Revision: Revision level of Licensed Internal Code that the SP is running. Each SP in the array runs the same LIC revision level.

    If the firmware revision or the PROM revision entries are zero or blank, the host connection is missing or invalid.

  • PROM Revision: Revision level of SP's PROM code. Each SP in the array runs the same revision level. PROM code is updated automatically when an SP's LIC is updated.

  • Model: Product model number for the SP.

  • Total Read Cache Memory: Number of megabytes allocated for read caching. This figure can be different from that for a second SP in the enclosure.

  • Total Write Cache Memory: Number of megabytes allocated for write caching. This figure is the same as that for a second SP in the enclosure.

  • Total Memory: Number of megabytes of the SP memory, which includes memory needed for cache buffers, RAID 3 usage, or the LIC). Each SP in the array must have the same amount of memory to make full use of the memory.


    Note: Total Memory is not the same as user free memory; see “Setting Up Memory Partitions” in Chapter 2 for more information.


  • Data Loop Fail-Over, Command Loop Fail-Over, Illegal Cross Loop: Displays whether any of these conditions exist.

Enclosure Cabling Order: Displays the order in which the enclosures are cabled.

SP Statistics Information

To display SP statistics information, click its radio button in the View area (upper right) of the SP Information window. Figure 4-12 shows an example.

Figure 4-12. SP Information Window: Statistics Information



Note: Statistics logging must be enabled for values in this window to be viewed.

The fields have the following meanings:

  • Statistics Logging: Enabled or Disabled, depending on the setting in the Array Configuration window, as explained on page 87 in “Using Statistics Logging.”

  • Number of Reads, Number of Writes: Total number of read and write requests made to the SP.

  • Number of Blocks Read, Number of Blocks Written: Total number of data blocks read from and written to the SP.

  • Percent Busy: Percentage of time that the SP was busy processing requests, and not idle. The number shows the relative load on the SP.

SP Cache Information

To display SP cache information, click its radio button in the View area (upper right) of the SP Information window. Figure 4-12 shows an example.

Figure 4-13. SP Information Window: Cache Information


The fields have the following meanings:

  • Read Cache State, Write Cache State: Current state of the SP's read or write cache: Read cache states are Enabled, Disabling, and Disabled. Write cache states are Enabled or Disabled, and several transition states, such as Initializing, Enabling, Disabling, Dumping, and Frozen.

  • Read Cache Size, Write Cache Size: Number of MB allocated to this SP's read and write caches. Some memory is required for system buffers and other system operations as explained in “Setting Up Array Memory for Caching or RAID 3 LUNs” in Chapter 2.

  • Read Cache Hit Ratio: The percentage of read cache hits for the SP. A read cache hit occurs when the SP finds a sought page in read cache memory, and thus does not need to read the page from disk. The ratio is meaningful only if the SP's read cache is enabled. For information on how to enable an SP's read cache, see “Disabling or Enabling Array Write Caching” in Chapter 3.

  • Write Cache Hit Ratio: The percentage of write cache hits for the SP's write cache. A write cache hit occurs when the SP finds and modifies data in the write cache memory, which usually saves a write operation. For example, a hit that occurs when a page is sought in a RAID 5 LUN eliminates the need to read, modify, and write the data. High ratios are desirable because each hit indicates at least one disk access that was not needed.

  • Write Cache Type: Write caching requires two SPs and is always mirrored.

  • Page Size: Number of KB in the cache page: 2, 4, 8, or 16. The default is 2 KB.

  • Number of Pages: The total number of pages in the SP. Each page has the cache page size you selected when setting up array caching. This number equals the cache size divided by the cache page size, minus space for checksum tables. If the array has two SPs and both are working, they divide the total number of pages between them. If an SP is idle for a long period or fails, the active SP can increase its share of pages.

  • Number of Unassigned Pages: Percentage of unassigned dirty pages in both SPs' write caches. Unassigned dirty pages are dirty pages belonging to a LUN that is not enabled for either SP, that is, not accessible from either SP. Unassigned pages would result, for example, if an SP fails and its write cache contains dirty pages. In such a case, the dirty pages belonging to the failed SP become unassigned pages.

    If the LUNs owned by the failed SP are transferred to the working SP, any unassigned pages for those LUNs transfer automatically to the working SP. The working SP writes these unassigned pages to the LUNs and the Unassigned Dirty Pages value returns to 0%. If the LUN that owns the dirty pages has an irrecoverable failure, you can clear the unassigned pages by unbinding the LUN, as explained in “Unbinding a LUN” in Chapter 3.

  • Percentage of Dirty Pages: Percentage of pages that have been modified in the SP's write cache, but that have not yet been written to disk. A high percentage of dirty pages means the cache is handling many write requests.

Getting Information on Other Array Components

To display information on the components of an array, select it in the Storage System Manager window; click the monitor array button near the middle of the toolbar or choose Monitor from the Array menu. The Components For Array window opens Figure 4-14 shows an example.

Figure 4-14. Example Equipment View


The icons indicate the type of the enclosure:

  • deskside Fibre Channel RAID enclosure (DPE):

  • rackmount Fibre Channel RAID enclosure (DPE):

  • deskside FibreVault with RAID disk modules (DAE):

  • rackmount FibreVault with RAID disk modules (DAE):

For information on the array components, double-click the enclosure icon in the Components For Array window. The Equipment View window opens, as shown in Figure 4-15.

Figure 4-15. Example Equipment View Window


Double-clicking the icon for a specific component displays a window with more information on the component. “Checking a Faulted Array” in Chapter 5 explains this information.

Displaying an SP Event Log

Each SP maintains a log of event messages. These events include hard errors, startups, and shutdowns involving disk modules, fans, SPs, LCCs, power supplies, and the SPS. The messages appear in the order in which the events occurred, so the most recent messages are at the end of the log. Periodically, the SP writes this log to disk to maintain it when SP power is off. The log can hold 16,800 bytes.

You can display all the events in the log or only the events that occurred from a date and time you specify up to the current time. You can also save the contents of the log to a file you specify.

Follow these steps to open an SP event log:

  1. In the Storage System Manager window, select the array whose SP event log you want to view.

  2. Click the History SP A or History SP B button near the right end of the array toolbar, or choose History from the Array menu, and then choose SP A or SP B.

The Event Log window opens. Scroll down to see the most recent events; Figure 4-16 shows an example.

Figure 4-16. Example Event Log


In this window, the columns headed Sense Key and Ext Code (extended code) are for use by service personnel.

The hexadecimal codes fall into four series:

  • informational codes (600h series) that do not indicate an error condition and require no action, but can help in establishing history

  • soft codes (800h series): most of these codes require no operator action unless they occur frequently

  • error codes (900h series) that indicate a serious error condition

  • fatal error codes (A00h series) that indicate a fatal error condition

For explanations of the error codes, see the release notes.

To display only events from a certain date, follow these steps:

  1. At the Display Entries As Of: field near the top of the Event Log window, select a time interval from the pull-down menu (such as the last 24 hours), or type in a date and time.

  2. Click Refresh at the bottom of the window.

To clear the list of displayed events, click Reset. The Event Log window shows only events added to the SP log as of the time you clicked Reset.

To clear the log (destroy all contents) and reset the log's counters to zero, click Clear Log; in the confirmation window, click Yes.

To display all events in the log, click Show All.

To save the contents of the log to a file, click Save. In the window that opens, specify filename, file type, and path.

Using the Event Monitor

You can use the event monitor to receive email notice of error messages within a range that you specify. This software is installed with the agents and their user interfaces, although it is separate from them; see Figure 1-1 in Chapter 1.

The event monitor polls the SP event log at a specified interval (default 60 seconds). If the SP logs an event within the range you specify, the event monitor logs the event as a message, and saves it in /usr/ssm/etc/messages. A launch application sends the message to the email addresses you specify. The following is an example message:

machine,company.com = host, sc3d8l0(SP A) = device
Poll failed, check agent status on host machine,company.com AGENT NOT RUNNING
Event Code Number = c02

To use the event monitor, follow these steps:

  1. Determine the range of error messages you want to know: see the information on hexadecimal codes on page 114 in “Displaying an SP Event Log.” View the messages in the release notes if necessary.

  2. In the /usr/ssm/etc directory, copy the template event monitor file ssmevent.config.proto to ssmevent.config.

  3. In a text editor, modify ssmevent.config for your needs:

    • At error, enter the range of error messages you want emailed to you, such as 0x900 0xaff (serious and fatal error codes).

    • At launch, enter the email address or IP address that is to receive the information.

    • At the three format lines, change the defaults if desired:

      The first format line displays the hostname and device name.

      The second format line displays the event error message.

      The third format line displays the event code. This line is optional; the other two are necessary to display meaningful error information.

    • At poll, enter the interval (number of seconds) at which you want the SP log polled; the default is 60.

      The default poll and refresh values keep network activity to a minimum, but allow fairly quick notification of any events that occur. Polling faster does not bring much benefit. If the management station is monitoring many large systems on several hosts so that the aggregate time to complete a poll of all systems might take a minute, set a slower poll rate (such as every 3000 to 6000 seconds (five to ten minutes). Use a slower rate also for a system using only the tty line.

    • At refresh, enter the interval (number of seconds) at which you want the event monitor to poll the internet connection with the host to determine whether contact with the host is still valid; the default is 15.

  4. In a text editor, open /usr/ssm/etc/.hosts and enter the hosts to be monitored.

  5. To start the ssmevent daemon and initiate the notification process, enter

    /etc/init.d/ssm_monitor start 
    

The following sample ssmevent.config file sends mail to [email protected] for SP log events 0x600 to 0xCFF (all error messages).

file /usr/ssm/etc/.hosts
monitor default 
error 0x600 0xcff
launch /usr/lib/sendmail [email protected] <
format %H = host, %D = device 
format %E
format Event Code Number = %N

poll 60

refresh 15

When the event monitor starts, it checks the syntax of the statements in the configuration file and reports any errors in /usr/ssm/log/ssmevent.log. If it finds no serious syntax errors, the event monitor tries to connect with the storage systems defined in the file, reporting the results to /usr/ssm/log/ssmevent.log. Each time the event monitor refreshes, it tries to connect to any server or storage system that it could not connect with before; if it is successful, it reports the connection to /usr/ssm/log/ssmevent.log.

As long as the ssmevent daemon is running, messages that are older than 48 hours are removed from /usr/ssm/etc/messages.


Note: To turn off the notification process, enter


/etc/init.d/ssm_monitor stop