This chapter describes how to operate the Challenge RAID storage system after you have configured it. The chapter explains
checking storage system status
shutting down the Challenge RAID storage system
restarting the Challenge RAID storage system
This chapter gives basics of the /usr/raid5/raidcli command (command-line interface, or CLI). Use raidcli with its parameters in an IRIX shell on Challenge systems to get names of devices controlled by the storage-control processor (SP); to display status information on disk modules, disk module groups (LUNs), SPs, and other system components; and to display the storage processor log, in which error messages are stored.
|Note: Although the directory and command are raidcli, the command is valid for all RAID levels.|
Other chapters in this guide explain how to use the raidcli command to bind (group) physical disks into RAID units and unbind them, set up caching, and accomplish other tasks.
Look for two lights at the right of the disk modules (deskside storage system) or above the disk modules (chassis in rack). The green light indicates that the unit is powered on; the amber light indicates a fault. See Figure 3-1.
The amber service light comes on when
an SP is reseated
the Challenge RAID is powered off and on
the battery backup unit has not finished recharging (if a battery backup unit is present in the system)
If the service light is lit, look for a disk–module fault light that is lit. Then you can either explore status further using the raidcli command in an IRIX shell or by using RAID5GUI.
This section explains
using the raidcli command
getting the device name with getagent
getting general system information
getting information about disks
getting information about other components
displaying the Challenge RAID unsolicited event log
The raidcli command sends storage management and configuration requests to an application programming interface (API) on the Challenge server. For the raidcli command to function, the agent—an interpreter between the command-line interface and the Challenge RAID storage system—must be running.
The synopsis of the raidcli command is
raidcli [-vp] [-d device] parameter [optional_arguments]
In this syntax, the variables have the following meanings:
Enables verbose return.
Parses the raidcli command without calling the API. If the string does not parse correctly, an error message is printed to stderr; otherwise there is no output.
Target RAID device. Use raidcli getagent for a list of RAID devices. This switch must be present for all raidcli management and configuration commands unless the environment variable indicates otherwise. This switch overrides an environment variable.
|Note: Appendix B is a complete alphabetical listing of raidcli parameters.|
The following is sample output for one device; normally, the output would give information on all devices:
Table 3-1 summarizes entries in the raidcli getagent output.
Revision number of RAID agent
ASCII string found in the agent configuration file, which assigns a name to the node being accessed (see Node description below)
ASCII string found in the agent configuration file, which describes the node being accessed (see Node description below)
The /dev/scsi entry that the agent uses as a path to the actual SCSI device. This value must be entered by the user for every CLI command (except getagent)
Unique 32-bit identifier for the SP being accessed through Node
Unique 32-bit identifier for the other SP in the chassis; 0 if no additional SP is present
7305: PowerPC™-based SP
SCSI ID number
Current PROM revision on the SP
Amount of DRAM present on the SP
12-digit ASCII string that uniquely identifies this subsystem
A partial output of this command follows:
System Fault LED: OFF Statistics Logging: ON System Cache: ON Max Requests: 23 Average Requests: 5 Hard errors: 0 Total Reads: 18345 Total Writes: 1304 Prct Busy: 25 Prct Idle: 75 System Date: 4/30/1995 Day of the week: Tuesday System Time: 12:43:54
For more information on this command, see “getcontrol” in Appendix B.
For information on a particular disk module, use
/usr/raid5/raidcli -d device getdisk [diskposition]
In this command, diskposition has the format bd, where b is the bus the disk is located on (a through e; be sure to use lowercase) and d is the device number (0 through 3). Figure 3-2 diagrams disk module locations.
For example, the following command gets information about disk module A2:
/usr/raid5/raidcli -d scsi4d210 getdisk a2
A sample output of this command follows:
A0 Vendor Id: <manufacturer> A0 Product Id: <part number> A0 Lun: 0 A0 State: Bound and Not Assigned A0 Hot Spare: NO A0 Prct Rebuilt: 100 A0 Prct Bound: 100 A0 Serial Number: 032306 A0 Capacity: 0x000f42a8 A0 Private: 0x00009000 A0 Bind Signature: 0x1c4eb2bc A0 Hard Read Errors: 0 A0 Hard Write Errors: 0 A0 Soft Read Errors: 0 A0 Soft Write Errors: 0 A0 Read Retries: 0 A0 Write Retries: 0 A0 Remapped Sectors: 0 A0 Number of Reads: 1007602 A0 Number of Writes: 1152057
Table 3-2 interprets items in this output.
Manufacturer of disk drive
Part number of disk
Logical unit number to which this disk is bound
Removed: disk is physically not present in the chassis or has been powered off
Off: disk is physically present in the chassis but is not spinning
Powering Up: disk is spinning and diagnostics are being run on it
Unbound: disk is healthy but is not part of a LUN
Bound and Not Assigned: disk is healthy, part of a LUN, but not being used by this SP
Rebuilding: disk is being rebuilt
Enabled: disk is healthy, bound, and being used by this SP
Binding: disk is in the process of being bound to a LUN
Formatting: disk is being formatted
YES or NO
Percentage of disk that has been rebuilt
Percentage of disk that has been bound
Serial number from disk inquiry command
Actual disk capacity in blocks
Amount of physical disk reserved for private space
Unique value assigned to each disk in a logical unit at bind time
Hard Read Errors
Number of hard errors encountered on reads for this disk
Hard Write Errors
Number of hard errors encountered on writes for this disk
Soft Read Errors
Number of soft errors encountered on reads for this disk
Soft Write Errors
Number of soft errors encountered on writes for this disk
Number of retries occurring during reads
Number of retries occurring during writes
Number of sectors that have been remapped
Number of Reads
Number of reads this disk has seen
Number of Writes
Number of writes this diskhas seen
A sample output of this command follows:
FANA State: Present FANB State: Present VSCA State: Present VSCB State: Present VSCC State: Present SPA State: Present SPB State: Present BBU State: Present
|Note: In this output, information on the power supplies is shown under VSC (voltage semi-regulated converter), SP information is shown under SP, and battery backup unit information is shown under BBU. Table 3-3 interprets items in this output.|
Optional third power supply.
Battery backup unit, which has three states: Present (fully charged) and Not Present (removed or charging). If the battery backup unit takes longer than an hour to charge, it shuts itself off and transitions tothe “Faulted” state.
For storage systems with RAID agent 1.55 and higher, use raidcli -d <device> getsp to see an SP's firmware revision number and model number only. For a system containing many Challenge RAID chassis assemblies, this parameter is especially useful as an alternative to raidcli getagent. For more information, see “getsp” in Appendix B.
The storage-control processor maintains a log of event messages in processor memory. These events include hard errors, startups, and shutdowns involving disk modules, fans, SPs, power supplies, and the battery backup unit. Periodically, the SP writes this log to disk to maintain it when SP power is off. The log can hold over 2,000 event messages; it has a filter feature that lets you select events by device or error message.
The event messages are in chronological order, with the most recent messages at the end. To display the entire log, use
To display the newest n entries in the log, starting with the oldest entry, use
raidcli -d device getlog +N
To display the oldest N entries in the log, starting with the oldest entry, use
raidcli -d device getlog -N
Output of the command raidcli -d <device> getlog +5 might be
12/17/94 09:59;51 A3: (A07) Cru Removed [0x47] 12/17/94 09:59;51 A3: (608) Cru Ready [0x0] 12/17/94 09:59;51 A3: (603) Cru Rebuild Started [0x0] 12/17/94 09:59;51 A3: (604) Cru Rebuild Complete [0x0] 12/17/94 09:59;51 A3: (602) Cru Enabled [0x0]
These entries show that a field-replaceable unit (disk module, fan unit, battery backup unit, SP) has been removed, replaced, rebuilt, and enabled.
At the tail of each log entry is an error code in brackets (for example, [0x47]) that gives diagnostic information when it is available. See “getlog” in Appendix B for explanations of these codes.
To clear the event log, use
|Note: You must be root to use the clearlog parameter.|
Any user can run a RAID5GUI session from any system on which it is installed, and can use it to monitor the storage system. Only authorized users can use RAID5GUI to configure or reconfigure the storage system. A user is authorized if the server's agent configuration file contains an entry for the user.
|Caution: The agent allows more than one RAID5GUI session to access the same storage system at the same time. You must make sure that two authorized users do not use RAID5GUI to configure or reconfigure the same storage-system chassis at the same time.|
This section explains
getting general system information
getting information about disk modules
getting information about system components
viewing SP status information
viewing the SP event log
enabling and disabling the statistics log
using RAID5GUI automatic polling
using alarm settings
After introductory screens, the Select Hosts window appears, as shown in Figure 3-3.
Upon startup, RAID5GUI looks for a file named by the RAID_ARRAY_HOSTS environment variable. If it can access the named file, the Select Hosts window appears, listing the hosts in the named file. If RAID5GUI cannot find or access this file, or if the RAID_ARRAY_HOSTS environment variable is not set, RAID5GUI looks for the .hosts file in your home directory. If it can access this file, the Select Hosts window appears, listing the hosts in the .hosts file.
If RAID5GUI cannot find or access the .hosts file, a popup appears noting that the host file could not be loaded. In this popup, click OK. In the Select Hosts window that appears, follow these steps to add hostnames to the empty selection box:
In the Select Hosts window, click Add.
In the text box that appears, enter the name or IP address of the host you want to add.
The hostname you add appears in the selection box and is added to the file named by the RAID_ARRAY_HOSTS environment variable. If this variable is not set, the hostname is added to the .hosts file in your home directory.
You can also use the Select hosts window to remove or rename hosts; click the Delete or Rename button, respectively. In either case, the information is added to the file named by the RAID_ARRAY_HOSTS environment variable or to the .hosts file in your home directory.
In the Select Hosts window, highlight the name of the host with the storage system you want to manage.
Click Select at the bottom of the window. If RAID5GUI was able to communicate with the agent on the server, the Select Chassis window appears, as shown in Figure 3-4.
Select the name of the chassis connection that you want to manage and click Select.
The Select Chassis and Select Hosts windows close, and the Equipment View for the storage system appears. Figure 3-5 shows an example. This window depicts the storage system (deskside) or, if you have a rackmount, the chassis assembly you selected.
Notice the service light button in the toolbar. If the circle is amber, a storage-control processor has detected a hard or logical fault in the chassis. You can click this button to display the Chassis Status window, which is explained later in this section.
The color of the service light button usually reflects the state of the physical service light on the Challenge RAID chassis (see Figure 3-1). When the chassis service light is lit, the on-screen service light button is usually amber. When the chassis service light is not lit, the service light button is usually gray.
The host select and chassis select buttons display the Select Hosts and Select Chassis windows, respectively.
|Note: If the agent or RAID5GUI was started (or restarted) after the chassis was powered on, the color of the service light button does not reflect that of the chassis service light.|
gray: operating normally
blue: disk module is in a transition state, such as powering up or binding (becoming part of a LUN)
amber: failed (disk module); failed or removed (other components)
white: component was not present when the agent was started
In the Equipment View, each slot for a disk module has a disk ID by which you identify the disk module, such as A0, A1, A2, A3, B0, B1, B2, and so on.
To display disk IDs in the Equipment View, select “Use Disk IDs” from the Views menu. The window for monitoring LUN status is explained in “Getting Disk Group (LUN) Information” in Chapter 4.
Iconify the Equipment View by clicking the button in the upper right corner.
To use the Summary View to determine disk module information, select “Summary View” in the Views menu. Figure 3-6 shows an example.
The arrangement of disk module buttons varies, depending on the storage system format: the buttons for a deskside storage system are arranged in two vertical rows.
Click an SP button to view information in the SP Summary window, as explained in “Viewing SP Status Information,” later in this chapter.
The LUNs belonging to each SP are shown below the SP in the Summary View. Disks not owned by an SP, such as hot spares, appear in the middle of the Summary View. “Using the LUN Information Window” in Chapter 4 contains more information about viewing LUN information.
Associated with each disk slot is an identifier based on its position in the chassis. To display disk IDs for the disk modules, select “Use Disk IDs” in the Views menu. Figure 3-7 shows the button for a disk module with disk ID enabled.
Table 3-4 summarizes the disk IDs.
Internal Challenge RAID SCSI Bus
0, 1, 2, 3
A0, A1, A2, A3
0, 1, 2, 3
B0, B1, B2, B3
0, 1, 2, 3
C0, C1, C2, C3
0, 1, 2, 3
D0, D1, D2, D3
0, 1, 2, 3
E0, E1, E2, E3
The operational state of the disk module appears next to the button. If the disk module is all or part of a LUN, that LUN's hexadecimal number also appears next to the button. Table 3-5 lists disk module states, the color of the disk module's button when the disk module is in that state, and a description of the state.
Currently being bound into a LUN.
No disk module was in this slot when the agent was started. This slot can contain an unbound disk module that was removed while the agent was running, or a bound disk module that was removed when the agent was restarted.
Either a hot spare on standby or part of a bound LUN that is assigned to (owned by) the SP you are using as the communication channel to the chassis. If the Challenge RAID storage system has another SP, this module's status is Ready when you use the other SP as the communication channel to the chassis.
Data from a hot spare is being copied onto a replacement disk module.
Powered off or inaccessible.
Being hardware-formatted. Generally, modules do not need hardware formatting.
Powered off by the SP, which can happen if a wrong size module is inserted.
Power is being applied to the disk module.
Module is part of a broken LUN or of a LUN that is bound and unassigned. The disk module might be part of a LUN that is not owned by the SP that you are using as the communication channel to the chassis. If the disk module is part of a LUN assigned to an SP other than the one you are using as the communication chassis, the module's status is either Enabled or Ready. It is Enabled when you use the other SP as the communication channel to the chassis.
Click on a disk module button to display the Disk Module Information window. Figure 3-8 shows an example.
In the Disk Module Information window,
the Description field gives the disk capacity
the State field indicates the disk module state; see Table 3-5
the LUN ID field contains the hexadecimal number identifying the LUN to which the disk module is bound; this number is the same as that specified in the Bind window (or with the raidcli bind command) when the LUN was bound
the Owner Type field shows the RAID level of the LUN
the Sectors field gives the number of user-accessible sectors on the disk module
Click Statistics in the Disk Module Information window to view read/write statistics compiled since the last time the statistics log for the SP that owns the LUN was turned on (see “Enabling and Disabling the Statistics Log” later in this chapter for more information). Figure 3-9 shows an example Disk Module Statistics window.
Table 3-6 explains the entries in this window.
Average Disk Request Service Time
Average number of milliseconds that the disk module required to execute an I/O request after the request reached the top of the queue.
Number of Reads
Number of Writes
Total read and write requests made to the disk module. (For read and write information for the entire LUN, see “Getting LUN Information Using RAID5GUI” in Chapter 4.)
Number of Blocks Read
Number of Blocks Written
Total data blocks read from and written to the disk module.
Number of Read Retries
Number of Write Retries
Total times read and write requests to the disk module were retried.
To view number of remapped sectors and hard and soft read/write errors, click Errors in the Disk Module Information window. Figure 3-10 shows an example Disk Module Errors window.
Number of Hard Read Errors
Number of Hard Write Errors
Total read or write errors that persisted through all the retries. An increasing number of hard errors might mean that one or more of the LUN's disk modules is nearing the end of its useful life.
Number of Soft Read Errors
Number of Soft Read Errors
Total number of read or write errors that disappeared before all the retries. An increasing number of soft errors might indicate that one of the LUN's disk modules is nearing the end of its useful life.
Total disk sectors that were faulty when written to, and thus were remapped to a different part of the disk module.
To get details on a specific system component, follow these steps:
To display more details on system components, select “Summary View” from the Views menu of the Equipment View. The Summary view for the selected chassis appears, as shown in Figure 3-6. (“Getting LUN Information Using RAID5GUI” in Chapter 4 explains features of this window in detail.)
Iconify a Summary View by clicking on the second button from the upper right corner. Each icon bears a diagram of a chassis, as shown in Figure 3-11.
Use the chassis icons for checking component status. Figure 3-11 shows three chassis with normal operation. If the chassis icon is blinking and shows a split chassis, one or more components have failed or have been removed; Figure 3-12 shows such an icon.
|Note: To update the information in the icons, restore the view that the icon represents and click the Poll button in the toolbar. For more information on polling, see “Using RAID5GUI Automatic Polling,” later in this chapter.|
If the chassis icon is blinking and shows a split chassis, restore the icon and check the service light button color, as explained in “Getting General System Information,” earlier in this chapter.
Click the service light button in the toolbar to display the Chassis Status window. Figure 3-13 shows an example.
Possibilities for each entry are Up, Down (component has failed or was removed after the agent started running), or Not Present (component failed or was removed before the agent started running).
|Note: For information on how to identify a specific defective component, see Chapter 6, “Identifying Failed System Components.”|
To view status information for an SP, use the SP Summary window. To display this window, click the SP's button in the Equipment View or Summary View. Figure 3-14 shows an example.
|Note: To display cache information, click Cache; the window that appears is explained in “Viewing Cache Statistics” in Chapter 7. To display SP even messages, click Log. See “Viewing the SP Event Log” later in this chapter for a description of the log.|
Table 3-8 lists the possibilities for the Status field.
The SP is the communication channel you are using to communicate with the chassis.
SP Not Connected
Agent cannot talk to the SP because a communication channel specifying the SP is not in the agent's configuration file for the selected host. For example, the SP is connected to a host different from that of the SP in the communications channel for the chassis.
SP Not Present
SP that is in the communication channel to the selected chassis has failed or has been removed.
SP was not present when the agent was started.
The SP Comm Channel field contains a SCSI device address that is the same as the device entry in the agent configuration file for the host to which the Challenge RAID storage system is attached. If this communication channel is not through this SP, and the SPs in the Challenge RAID system are connected to different hosts, the word Unknown appears in this field, instead of a SCSI device address, and the values in the rest of the fields in this window are either 0 or Unknown. For more information on the SP communication channel, see “Shutting Down the Challenge RAID Storage System” later in this chapter.
The Firmware Revision field gives the revision number of the Licensed Internal Code (LIC) that the SP is running. All SPs in the system run the same LIC revision. The PROM Revision field indicates the revision number of the SP's PROM code; all SPs in the system run the same PROM revision.
The Total Memory field lists the number of megabytes (8, 16, 32, or 64) in the SP's memory. To make full use of cache memory, each SP in the system should have the same amount of memory.
The SP SCSI ID field gives the SP's SCSI ID, which is determined by switch settings on the SP. The Silicon Graphics System Service Engineer sets these when the Challenge RAID storage system is installed.
To view settings for SPs and LUNs in the system, select “View Settings” in the Options menu of either the Summary View or Equipment View. Figure 3-15 shows an example.
The View Settings window summarizes information in the SP Summary and LUN Summary windows, and includes cache and other information.
At the top left of the window, you can view settings for either SP in the system by clicking the appropriate diamond-shaped radio button. At the top right, use the option menu to select a LUN owned by the SP that is currently selected.
This window is available for firmware revision 9.0 and higher (RAID agent 1.55, SP model number 7305).
Each SP maintains a log of event messages. These events include hard errors, startups, and shutdowns involving disk modules, fans, SPs, VSCs, and the BBU. Periodically, the SP writes this log to disk to maintain it when SP power is off. The log can hold over 2,000 event messages; when that amount is reached, the oldest messages are deleted in chronological order as new messages come in.
To display the SP log, click Log in the SP Summary window. Figure 3-16 shows an example.
Event messages are in chronological order, with the most recent ones at the end of the log. To display earlier messages, use the scroll bar to move backwards through the log.
Event codes and their corresponding messages are in Appendix C, “Storage-Control Processor Event-Log Error Codes.”
Click Save... to save the log to a file.
Click Clear to clear the contents of the log and reset it.
|Caution: Clearing the log can cause problems for other users who are viewing the same log.|
To exit the log, click Close.
Because the log uses a 32–bit counter to maintain the statistics numbers, the statistics numbers start over at zero when the counter is full. Thus, you see a sudden decrease in a statistics number if you view it shortly before the counter is full and shortly after the counter restarts at zero. To keep the log turned on for more than two weeks, reset the log about every two weeks, so that you know when the numbers start at zero.
When the log is on, it affects storage system performance; you may want to disable it unless you have a reason to monitor performance. To turn the SP's statistics log on and off, select “Toggle Statistics Logging” in the Options menu of the Equipment View or Summary View. In the submenu, click on the item for the SP whose log you want to turn on or off. If the SP's log is on, a box appears in front of the item for the SP.
To reset (enable) the SP's statistics log, select “Reset Statistics Log” in the Options menu of the Equipment View or Summary View. In the submenu, click on the item for the SP whose log you want to reset.
RAID5GUI automatically polls the agent on the selected host to get the current status, statistics, and event log information for each selected chassis. In the Poll Setting window, you can change the default polling interval or turn automatic polling off or on for the current RAID5GUI session.
To poll an agent manually, click the Poll button on the Equipment View or Summary View toolbar.
To change the automatic polling intervals, follow these steps:
In the Equipment or Summary View window, select “Poll Setting” from the Options menu. The Poll Settings window appears.
To change the poll interval for status and statistics, click the list button beside the Poll Interval field and select the desired number of seconds from the list box that appears.
Available poll intervals are 5, 10, 15, 30, 60, 120, 300, 600, 1800, and 3600 seconds; the default is 60 seconds.
Click Set. The new poll interval(s) takes effect immediately.
To turn automatic polling off, follow these steps:
Select “Poll Setting” in the Options menu of the Equipment View or Summary View.
In the Poll Settings window that appears, click Manual Polling; click Set. Automatic polling is turned off immediately. To update status, statistics, and event log information, you must manually poll the agent.
To turn manual polling on, click Manual Polling and click Set in the Poll Settings window.
The graphical user interface includes alarm settings that are useful for monitoring the status of system components. Alarm message can be sent to an e-mail address, appear in a window, or both. Figure 3-18 shows the alarm message window.
To change alarm settings, select “Alarms” in the Options menu; the Alarm Settings window appears, as shown in Figure 3-19.
Use the Alarm Settings window as follows:
To toggle display of alarm messages on or off, click Screen Alarms.
To start or stop sending alarm messages to an e-mail address, click Enable Email Alarms. Click the Email Address field, and enter the address to which the alarms should be sent.
To enable any of these settings, click Set.
Alarm settings are not saved from one session to the next. If you exit RAID5GUI, you must reset alarm settings when you restart it.
To shut down the Challenge RAID storage system, follow these steps:
If you are using storage system caching, make sure that it is disabled; use one of these methods:
Use raidcli getcache to check status.
In RAID5GUI, click an SP icon in the Equipment View; in the SP Summary window that appears, click Cache.
If necessary, disable caching as explained in Chapter 7.
Turn off the power switch on the back of the Challenge RAID storage system, as shown in Figure 3-20.
|Note: You do not need to disable the power for the SP(s).|
To start the Challenge RAID storage system, follow these steps:
Turn on the storage system's power; see Figure 3-20.
The green power light on the front of the storage system lights up (see Figure 3-21) and the fans rotate.
If none of the busy lights on the drive modules light up, make sure that the power for each SP is enabled. Move the fan module's latch to the UNLOCK position, as indicated in Figure 3-22.
Swing open the fan module.
|Caution: To prevent thermal shutdown of the system, never leave the fan module open more than two minutes.|
For the AMD-based SP, move the SP's power switch to the enable position, as shown in Figure 3-23.
|Note: The PowerPC-based SP does not have a power enable/disable switch on the bezel.|