This chapter describes how to operate the Challenge RAID storage system after you have configured it. The chapter explains
checking storage system status
shutting down the Challenge RAID storage system
restarting the Challenge RAID storage system
This chapter gives basics of the /usr/raid5/raidcli command (command-line interface, or CLI). Use raidcli with its parameters in an IRIX shell on Challenge systems to get names of devices controlled by the storage-control processor (SP); to display status information on disk modules, disk module groups (LUNs), SPs, and other system components; and to display the storage processor log, in which error messages are stored.
![]() | Note: Although the directory and command are raidcli, the command is valid for all RAID levels. |
Other chapters in this guide explain how to use the raidcli command to bind (group) physical disks into RAID units and unbind them, set up caching, and accomplish other tasks.
To check storage system status, you may find it easiest to look at the storage system cabinet to see if the amber service light is lit.
Look for two lights at the right of the disk modules (deskside storage system) or above the disk modules (chassis in rack). The green light indicates that the unit is powered on; the amber light indicates a fault. See Figure 3-1.
The amber service light comes on when
an SP is reseated
the Challenge RAID is powered off and on
the battery backup unit has not finished recharging (if a battery backup unit is present in the system)
If the service light is lit, look for a disk–module fault light that is lit. Then you can either explore status further using the raidcli command in an IRIX shell or by using RAID5GUI.
This section explains
using the raidcli command
getting the device name with getagent
getting general system information
getting information about disks
getting information about other components
displaying the Challenge RAID unsolicited event log
The raidcli command sends storage management and configuration requests to an application programming interface (API) on the Challenge server. For the raidcli command to function, the agent—an interpreter between the command-line interface and the Challenge RAID storage system—must be running.
The synopsis of the raidcli command is
raidcli [-vp] [-d device] parameter [optional_arguments] |
In this syntax, the variables have the following meanings:
Use the getagent parameter with raidcli to display information on devices controlled by the API:
raidcli getagent |
The following is sample output for one device; normally, the output would give information on all devices:
Table 3-1 summarizes entries in the raidcli getagent output.
Table 3-1. Output of raidcli getagent
To get general system information, use
raidcli -d device getcontrol |
A partial output of this command follows:
System Fault LED: OFF Statistics Logging: ON System Cache: ON Max Requests: 23 Average Requests: 5 Hard errors: 0 Total Reads: 18345 Total Writes: 1304 Prct Busy: 25 Prct Idle: 75 System Date: 4/30/1995 Day of the week: Tuesday System Time: 12:43:54 |
For more information on this command, see “getcontrol” in Appendix B.
For information about all disk modules in the system, use this command in an IRIX shell:
/usr/raid5/raidcli -d device getdisk |
For information on a particular disk module, use
/usr/raid5/raidcli -d device getdisk [diskposition] |
In this command, diskposition has the format bd, where b is the bus the disk is located on (a through e; be sure to use lowercase) and d is the device number (0 through 3). Figure 3-2 diagrams disk module locations.
For example, the following command gets information about disk module A2:
/usr/raid5/raidcli -d scsi4d210 getdisk a2 |
A sample output of this command follows:
A0 Vendor Id: <manufacturer> A0 Product Id: <part number> A0 Lun: 0 A0 State: Bound and Not Assigned A0 Hot Spare: NO A0 Prct Rebuilt: 100 A0 Prct Bound: 100 A0 Serial Number: 032306 A0 Capacity: 0x000f42a8 A0 Private: 0x00009000 A0 Bind Signature: 0x1c4eb2bc A0 Hard Read Errors: 0 A0 Hard Write Errors: 0 A0 Soft Read Errors: 0 A0 Soft Write Errors: 0 A0 Read Retries: 0 A0 Write Retries: 0 A0 Remapped Sectors: 0 A0 Number of Reads: 1007602 A0 Number of Writes: 1152057 |
Table 3-2 interprets items in this output.
Table 3-2. Output of raidcli getdisk
Output | Meaning |
---|---|
Vendor Id | Manufacturer of disk drive |
Product Id | Part number of disk |
Lun | Logical unit number to which this disk is bound |
State | Removed: disk is physically not present in the chassis or has been powered off Off: disk is physically present in the chassis but is not spinning Powering Up: disk is spinning and diagnostics are being run on it Unbound: disk is healthy but is not part of a LUN Bound and Not Assigned: disk is healthy, part of a LUN, but not being used by this SP Rebuilding: disk is being rebuilt Enabled: disk is healthy, bound, and being used by this SP Binding: disk is in the process of being bound to a LUN Formatting: disk is being formatted |
Hot Spare | YES or NO |
Prct Rebuilt | Percentage of disk that has been rebuilt |
Prct Bound | Percentage of disk that has been bound |
Serial Number | Serial number from disk inquiry command |
Capacity | Actual disk capacity in blocks |
Private | Amount of physical disk reserved for private space |
Bind Signature | Unique value assigned to each disk in a logical unit at bind time |
Hard Read Errors | Number of hard errors encountered on reads for this disk |
Hard Write Errors | Number of hard errors encountered on writes for this disk |
Soft Read Errors | Number of soft errors encountered on reads for this disk |
Soft Write Errors | Number of soft errors encountered on writes for this disk |
Read Retries | Number of retries occurring during reads |
Write Retries | Number of retries occurring during writes |
Remapped Sectors | Number of sectors that have been remapped |
Number of Reads | Number of reads this disk has seen |
Number of Writes | Number of writes this disk has seen
|
For state information on other components—field-replaceable units—in the Challenge RAID storage system besides disk modules, use
raidcli -d device getcrus |
A sample output of this command follows:
FANA State: Present FANB State: Present VSCA State: Present VSCB State: Present VSCC State: Present SPA State: Present SPB State: Present BBU State: Present |
![]() | Note: In this output, information on the power supplies is shown under VSC (voltage semi-regulated converter), SP information is shown under SP, and battery backup unit information is shown under BBU. Table 3-3 interprets items in this output. |
Table 3-3. Output of raidcli getcrus
For storage systems with RAID agent 1.55 and higher, use raidcli -d <device> getsp to see an SP's firmware revision number and model number only. For a system containing many Challenge RAID chassis assemblies, this parameter is especially useful as an alternative to raidcli getagent. For more information, see “getsp” in Appendix B.
The storage-control processor maintains a log of event messages in processor memory. These events include hard errors, startups, and shutdowns involving disk modules, fans, SPs, power supplies, and the battery backup unit. Periodically, the SP writes this log to disk to maintain it when SP power is off. The log can hold over 2,000 event messages; it has a filter feature that lets you select events by device or error message.
The event messages are in chronological order, with the most recent messages at the end. To display the entire log, use
raidcli -d device getlog |
To display the newest n entries in the log, starting with the oldest entry, use
raidcli -d device getlog +N |
To display the oldest N entries in the log, starting with the oldest entry, use
raidcli -d device getlog -N |
Output of the command raidcli -d <device> getlog +5 might be
12/17/94 09:59;51 A3: (A07) Cru Removed [0x47] 12/17/94 09:59;51 A3: (608) Cru Ready [0x0] 12/17/94 09:59;51 A3: (603) Cru Rebuild Started [0x0] 12/17/94 09:59;51 A3: (604) Cru Rebuild Complete [0x0] 12/17/94 09:59;51 A3: (602) Cru Enabled [0x0] |
These entries show that a field-replaceable unit (disk module, fan unit, battery backup unit, SP) has been removed, replaced, rebuilt, and enabled.
At the tail of each log entry is an error code in brackets (for example, [0x47]) that gives diagnostic information when it is available. See “getlog” in Appendix B for explanations of these codes.
To clear the event log, use
raidcli -d device clearlog |
![]() | Note: You must be root to use the clearlog parameter. |
Any user can run a RAID5GUI session from any system on which it is installed, and can use it to monitor the storage system. Only authorized users can use RAID5GUI to configure or reconfigure the storage system. A user is authorized if the server's agent configuration file contains an entry for the user.
![]() | Caution: The agent allows more than one RAID5GUI session to access the same storage system at the same time. You must make sure that two authorized users do not use RAID5GUI to configure or reconfigure the same storage-system chassis at the same time. |
This section explains
starting RAID5GUI
getting general system information
getting information about disk modules
getting information about system components
viewing SP status information
viewing settings
viewing the SP event log
enabling and disabling the statistics log
using RAID5GUI automatic polling
using alarm settings
exiting RAID5GUI
To start a RAID5GUI session on a client system, enter
/usr/raid5/raid5gui |
After introductory screens, the Select Hosts window appears, as shown in Figure 3-3.
Upon startup, RAID5GUI looks for a file named by the RAID_ARRAY_HOSTS environment variable. If it can access the named file, the Select Hosts window appears, listing the hosts in the named file. If RAID5GUI cannot find or access this file, or if the RAID_ARRAY_HOSTS environment variable is not set, RAID5GUI looks for the .hosts file in your home directory. If it can access this file, the Select Hosts window appears, listing the hosts in the .hosts file.
If RAID5GUI cannot find or access the .hosts file, a popup appears noting that the host file could not be loaded. In this popup, click OK. In the Select Hosts window that appears, follow these steps to add hostnames to the empty selection box:
In the Select Hosts window, click Add.
In the text box that appears, enter the name or IP address of the host you want to add.
Click OK.
The hostname you add appears in the selection box and is added to the file named by the RAID_ARRAY_HOSTS environment variable. If this variable is not set, the hostname is added to the .hosts file in your home directory.
You can also use the Select hosts window to remove or rename hosts; click the Delete or Rename button, respectively. In either case, the information is added to the file named by the RAID_ARRAY_HOSTS environment variable or to the .hosts file in your home directory.
To get general system information, follow these steps:
In the Select Hosts window, highlight the name of the host with the storage system you want to manage.
Click Select at the bottom of the window. If RAID5GUI was able to communicate with the agent on the server, the Select Chassis window appears, as shown in Figure 3-4.
Select the name of the chassis connection that you want to manage and click Select.
The Select Chassis and Select Hosts windows close, and the Equipment View for the storage system appears. Figure 3-5 shows an example. This window depicts the storage system (deskside) or, if you have a rackmount, the chassis assembly you selected.
Notice the service light button in the toolbar. If the circle is amber, a storage-control processor has detected a hard or logical fault in the chassis. You can click this button to display the Chassis Status window, which is explained later in this section.
The color of the service light button usually reflects the state of the physical service light on the Challenge RAID chassis (see Figure 3-1). When the chassis service light is lit, the on-screen service light button is usually amber. When the chassis service light is not lit, the service light button is usually gray.
The host select and chassis select buttons display the Select Hosts and Select Chassis windows, respectively.
![]() | Note: If the agent or RAID5GUI was started (or restarted) after the chassis was powered on, the color of the service light button does not reflect that of the chassis service light. |
The chassis drawing shows the disk modules; storage-control processors (SPs); fan module; voltage semiregulated converters (VSCs), which are power supplies; and a battery backup unit (BBU).
Each drawing of a component and an empty disk slot is a button whose color shows the health of that component:
gray: operating normally
blue: disk module is in a transition state, such as powering up or binding (becoming part of a LUN)
amber: failed (disk module); failed or removed (other components)
white: component was not present when the agent was started
In the Equipment View, each slot for a disk module has a disk ID by which you identify the disk module, such as A0, A1, A2, A3, B0, B1, B2, and so on.
To display disk IDs in the Equipment View, select “Use Disk IDs” from the Views menu. The window for monitoring LUN status is explained in “Getting Disk Group (LUN) Information” in Chapter 4.
Iconify the Equipment View by clicking the button in the upper right corner.
To use the Summary View to determine disk module information, select “Summary View” in the Views menu. Figure 3-6 shows an example.
The arrangement of disk module buttons varies, depending on the storage system format: the buttons for a deskside storage system are arranged in two vertical rows.
Click an SP button to view information in the SP Summary window, as explained in “Viewing SP Status Information,” later in this chapter.
The LUNs belonging to each SP are shown below the SP in the Summary View. Disks not owned by an SP, such as hot spares, appear in the middle of the Summary View. “Using the LUN Information Window” in Chapter 4 contains more information about viewing LUN information.
The positions of the disk module buttons in the Summary View correspond to their positions in the Challenge RAID chassis's disk module slots.
Associated with each disk slot is an identifier based on its position in the chassis. To display disk IDs for the disk modules, select “Use Disk IDs” in the Views menu. Figure 3-7 shows the button for a disk module with disk ID enabled.
Table 3-4 summarizes the disk IDs.
Internal Challenge RAID SCSI Bus | Position | Disk IDs |
---|---|---|
A | 0, 1, 2, 3 | A0, A1, A2, A3 |
B | 0, 1, 2, 3 | B0, B1, B2, B3 |
C | 0, 1, 2, 3 | C0, C1, C2, C3 |
D | 0, 1, 2, 3 | D0, D1, D2, D3 |
E | 0, 1, 2, 3 | E0, E1, E2, E3 |
The operational state of the disk module appears next to the button. If the disk module is all or part of a LUN, that LUN's hexadecimal number also appears next to the button. Table 3-5 lists disk module states, the color of the disk module's button when the disk module is in that state, and a description of the state.
Click on a disk module button to display the Disk Module Information window. Figure 3-8 shows an example.
In the Disk Module Information window,
the Description field gives the disk capacity
the State field indicates the disk module state; see Table 3-5
the LUN ID field contains the hexadecimal number identifying the LUN to which the disk module is bound; this number is the same as that specified in the Bind window (or with the raidcli bind command) when the LUN was bound
the Owner Type field shows the RAID level of the LUN
the Sectors field gives the number of user-accessible sectors on the disk module
Click Statistics in the Disk Module Information window to view read/write statistics compiled since the last time the statistics log for the SP that owns the LUN was turned on (see “Enabling and Disabling the Statistics Log” later in this chapter for more information). Figure 3-9 shows an example Disk Module Statistics window.
Table 3-6 explains the entries in this window.
Table 3-6. Disk Module Statistics Window Entries
Entry | Meaning |
---|---|
Average Disk Request Service Time | Average number of milliseconds that the disk module required to execute an I/O request after the request reached the top of the queue. |
Number of Reads Number of Writes | Total read and write requests made to the disk module. (For read and write information for the entire LUN, see “Getting LUN Information Using RAID5GUI” in Chapter 4.) |
Number of Blocks Read Number of Blocks Written | Total data blocks read from and written to the disk module. |
Number of Read Retries Number of Write Retries | Total times read and write requests to the disk module were retried. |
To view number of remapped sectors and hard and soft read/write errors, click Errors in the Disk Module Information window. Figure 3-10 shows an example Disk Module Errors window.
Table 3-7 explains the entries in the window shown in Figure 3-10.
Table 3-7. Disk Module Errors Window Entries
Entry | Meaning |
---|---|
Number of Hard Read Errors Number of Hard Write Errors | Total read or write errors that persisted through all the retries. An increasing number of hard errors might mean that one or more of the LUN's disk modules is nearing the end of its useful life. |
Number of Soft Read Errors Number of Soft Read Errors | Total number of read or write errors that disappeared before all the retries. An increasing number of soft errors might indicate that one of the LUN's disk modules is nearing the end of its useful life. |
Remapped Sectors | Total disk sectors that were faulty when written to, and thus were remapped to a different part of the disk module. |
To get details on a specific system component, follow these steps:
To display more details on system components, select “Summary View” from the Views menu of the Equipment View. The Summary view for the selected chassis appears, as shown in Figure 3-6. (“Getting LUN Information Using RAID5GUI” in Chapter 4 explains features of this window in detail.)
Iconify a Summary View by clicking on the second button from the upper right corner. Each icon bears a diagram of a chassis, as shown in Figure 3-11.
Use the chassis icons for checking component status. Figure 3-11 shows three chassis with normal operation. If the chassis icon is blinking and shows a split chassis, one or more components have failed or have been removed; Figure 3-12 shows such an icon.
![]() | Note: To update the information in the icons, restore the view that the icon represents and click the Poll button in the toolbar. For more information on polling, see “Using RAID5GUI Automatic Polling,” later in this chapter. |
If the chassis icon is blinking and shows a split chassis, restore the icon and check the service light button color, as explained in “Getting General System Information,” earlier in this chapter.
Click the service light button in the toolbar to display the Chassis Status window. Figure 3-13 shows an example.
Possibilities for each entry are Up, Down (component has failed or was removed after the agent started running), or Not Present (component failed or was removed before the agent started running).
![]() | Note: For information on how to identify a specific defective component, see Chapter 6, “Identifying Failed System Components.” |
To view status information for an SP, use the SP Summary window. To display this window, click the SP's button in the Equipment View or Summary View. Figure 3-14 shows an example.
![]() | Note: To display cache information, click Cache; the window that appears is explained in “Viewing Cache Statistics” in Chapter 7. To display SP even messages, click Log. See “Viewing the SP Event Log” later in this chapter for a description of the log. |
Table 3-8 lists the possibilities for the Status field.
Table 3-8. SP Summary Status Field
Status Entry | Meaning |
---|---|
SP Present | The SP is the communication channel you are using to communicate with the chassis. |
SP Not Connected | Agent cannot talk to the SP because a communication channel specifying the SP is not in the agent's configuration file for the selected host. For example, the SP is connected to a host different from that of the SP in the communications channel for the chassis. |
SP Not Present | SP that is in the communication channel to the selected chassis has failed or has been removed. |
SP Removed | SP was not present when the agent was started. |
The SP Comm Channel field contains a SCSI device address that is the same as the device entry in the agent configuration file for the host to which the Challenge RAID storage system is attached. If this communication channel is not through this SP, and the SPs in the Challenge RAID system are connected to different hosts, the word Unknown appears in this field, instead of a SCSI device address, and the values in the rest of the fields in this window are either 0 or Unknown. For more information on the SP communication channel, see “Shutting Down the Challenge RAID Storage System” later in this chapter.
The Firmware Revision field gives the revision number of the Licensed Internal Code (LIC) that the SP is running. All SPs in the system run the same LIC revision. The PROM Revision field indicates the revision number of the SP's PROM code; all SPs in the system run the same PROM revision.
The Total Memory field lists the number of megabytes (8, 16, 32, or 64) in the SP's memory. To make full use of cache memory, each SP in the system should have the same amount of memory.
The SP SCSI ID field gives the SP's SCSI ID, which is determined by switch settings on the SP. The Silicon Graphics System Service Engineer sets these when the Challenge RAID storage system is installed.
To view settings for SPs and LUNs in the system, select “View Settings” in the Options menu of either the Summary View or Equipment View. Figure 3-15 shows an example.
The View Settings window summarizes information in the SP Summary and LUN Summary windows, and includes cache and other information.
At the top left of the window, you can view settings for either SP in the system by clicking the appropriate diamond-shaped radio button. At the top right, use the option menu to select a LUN owned by the SP that is currently selected.
This window is available for firmware revision 9.0 and higher (RAID agent 1.55, SP model number 7305).
Each SP maintains a log of event messages. These events include hard errors, startups, and shutdowns involving disk modules, fans, SPs, VSCs, and the BBU. Periodically, the SP writes this log to disk to maintain it when SP power is off. The log can hold over 2,000 event messages; when that amount is reached, the oldest messages are deleted in chronological order as new messages come in.
To display the SP log, click Log in the SP Summary window. Figure 3-16 shows an example.
Event messages are in chronological order, with the most recent ones at the end of the log. To display earlier messages, use the scroll bar to move backwards through the log.
Event codes and their corresponding messages are in Appendix C, “Storage-Control Processor Event-Log Error Codes.”
Click Save... to save the log to a file.
Click Clear to clear the contents of the log and reset it.
![]() | Caution: Clearing the log can cause problems for other users who are viewing the same log. |
To exit the log, click Close.
The SP maintains a log of statistics for the LUNs, disk modules, and storage system caching, which you can turn on and off.
Because the log uses a 32–bit counter to maintain the statistics numbers, the statistics numbers start over at zero when the counter is full. Thus, you see a sudden decrease in a statistics number if you view it shortly before the counter is full and shortly after the counter restarts at zero. To keep the log turned on for more than two weeks, reset the log about every two weeks, so that you know when the numbers start at zero.
When the log is on, it affects storage system performance; you may want to disable it unless you have a reason to monitor performance. To turn the SP's statistics log on and off, select “Toggle Statistics Logging” in the Options menu of the Equipment View or Summary View. In the submenu, click on the item for the SP whose log you want to turn on or off. If the SP's log is on, a box appears in front of the item for the SP.
To reset (enable) the SP's statistics log, select “Reset Statistics Log” in the Options menu of the Equipment View or Summary View. In the submenu, click on the item for the SP whose log you want to reset.
RAID5GUI automatically polls the agent on the selected host to get the current status, statistics, and event log information for each selected chassis. In the Poll Setting window, you can change the default polling interval or turn automatic polling off or on for the current RAID5GUI session.
To poll an agent manually, click the Poll button on the Equipment View or Summary View toolbar.
To change the automatic polling intervals, follow these steps:
In the Equipment or Summary View window, select “Poll Setting” from the Options menu. The Poll Settings window appears.
To change the poll interval for status and statistics, click the list button beside the Poll Interval field and select the desired number of seconds from the list box that appears.
Available poll intervals are 5, 10, 15, 30, 60, 120, 300, 600, 1800, and 3600 seconds; the default is 60 seconds.
Click Set. The new poll interval(s) takes effect immediately.
To turn automatic polling off, follow these steps:
Select “Poll Setting” in the Options menu of the Equipment View or Summary View.
In the Poll Settings window that appears, click Manual Polling; click Set. Automatic polling is turned off immediately. To update status, statistics, and event log information, you must manually poll the agent.
To turn manual polling on, click Manual Polling and click Set in the Poll Settings window.
The graphical user interface includes alarm settings that are useful for monitoring the status of system components. Alarm message can be sent to an e-mail address, appear in a window, or both. Figure 3-18 shows the alarm message window.
To change alarm settings, select “Alarms” in the Options menu; the Alarm Settings window appears, as shown in Figure 3-19.
Use the Alarm Settings window as follows:
To toggle display of alarm messages on or off, click Screen Alarms.
To start or stop sending alarm messages to an e-mail address, click Enable Email Alarms. Click the Email Address field, and enter the address to which the alarms should be sent.
To enable any of these settings, click Set.
Alarm settings are not saved from one session to the next. If you exit RAID5GUI, you must reset alarm settings when you restart it.
To shut down the Challenge RAID storage system, follow these steps:
If you are using storage system caching, make sure that it is disabled; use one of these methods:
Use raidcli getcache to check status.
In RAID5GUI, click an SP icon in the Equipment View; in the SP Summary window that appears, click Cache.
If necessary, disable caching as explained in Chapter 7.
Turn off the power switch on the back of the Challenge RAID storage system, as shown in Figure 3-20.
![]() | Note: You do not need to disable the power for the SP(s). |
To start the Challenge RAID storage system, follow these steps:
Turn on the storage system's power; see Figure 3-20.
The green power light on the front of the storage system lights up (see Figure 3-21) and the fans rotate.
If none of the busy lights on the drive modules light up, make sure that the power for each SP is enabled. Move the fan module's latch to the UNLOCK position, as indicated in Figure 3-22.
Swing open the fan module.
![]() | Caution: To prevent thermal shutdown of the system, never leave the fan module open more than two minutes. |
For the AMD-based SP, move the SP's power switch to the enable position, as shown in Figure 3-23.
![]() | Note: The PowerPC-based SP does not have a power enable/disable switch on the bezel. |
Close the fan module and move the module's latch to the LOCK position.