RAID (redundant array of inexpensive disks) technology provides redundant disk resources in disk-array configurations that make the storage system more highly available and improve reliability and performance. RAID was first defined by D. Patterson, G. Garth, and R. Katz of the University of California, Berkeley, in their 1987 paper, “A Case for Redundant Arrays of Inexpensive Disks (RAID)” . That paper defines various levels of RAID.
This chapter introduces the Challenge RAID disk-array storage system. It explains
Challenge RAID storage systems
Challenge RAID storage system components
data availability and performance
the RAID hot spare
using the Challenge RAID command-line interface
using the RAID graphical user interface RAID5GUI
Figure 1-1 is an external view of the deskside version of the Challenge RAID storage system.
|Note: In Figure 1-1, the front cover is removed for clarity.|
Figure 1-2 is an external view of the Challenge RAID rack, with the maximum of four chassis assemblies installed. Each chassis assembly in a Challenge RAID rack corresponds to one deskside Challenge RAID chassis.
The Challenge RAID storage system provides a compact, high-capacity, high-availability source of disk storage for your Challenge S, DM, L, or XL server system. The storage system offers a large capacity of high-availability disk storage in multiple disk modules that you can replace when the storage system is turned on.
Only Silicon Graphics service personnel authorized to open the computer cabinet and replace parts should install or replace the SCSI–2 interface. You can replace the disk modules by following instructions in Chapter 5 in this guide.
A Challenge server can support multiple Challenge RAID storage systems. The various storage system configurations, along with their availability and performance features, are explained in Chapter 2 in this guide.
The number of Challenge RAID deskside storage systems or rack chassis assemblies that can be connected on one SCSI bus is limited by the recommended SCSI bus length limit of 60 feet, as diagrammed in Figure 1-3.
|Caution: Although the SCSI bus absolute length limit is 80 feet, exceeding 60 feet on the SCSI bus is not recommended. When cable lengths exceed 60 feet, problems can occur on the SCSI mezzanine card, the IO4B board, or both.|
The Challenge RAID deskside storage system, or each chassis assembly in the Challenge RAID rack, consists of these components:
one or more host SCSI–2 interfaces that are either
native to the POWER Channel™ 2 I/O controller (IO4 board) in the Challenge server
HIO add-on cards (mezzanine cards) on the POWER Channel 2 I/O controller
one or two storage-control processors (SPs)
5 to 20 disk modules in groups of five
one fan module containing six fans, wired in two groups of three fans each
two or three power supplies (VSCs, or voltage semi-regulated converters)
one battery backup unit (BBU) for storage system caching (optional)
The Challenge server holds the SCSI–2 interface(s); the Challenge RAID storage system chassis holds the other components.
The SCSI–2 interface transfers data between host memory and the SCSI–2 differential bus. A cable connects the SCSI–2 interface to the SP(s) in the storage-system cabinet, as diagrammed in Figure 1-4. The SCSI–2 interface has an operating system-specific device name and ID number.
The storage-control processor (SP) is a printed circuit board with memory modules that resides in the storage-system cabinet. It controls the disk modules in the storage system through a synchronous SCSI–2 bus. An SP has five internal fast/narrow SCSI buses, each supporting four disk modules, for a total of 20 disk modules.
Figure 1-4 diagrams a Challenge RAID storage system with one SP.
For higher performance, a Challenge RAID storage system can support an additional SP. The second SP provides a second path to the storage system, so both SPs can connect to the same host or two different hosts, as diagrammed in Figure 1-5 and Figure 1-6. With two SPs, the storage system can support storage system caching, whereby each SP temporarily stores modified data in its memory and writes the data to disk at the most expedient time.
Note that the disk modules are owned by the SP, and not by the host, the SCSI-2 bus, or the storage system.
The Challenge RAID storage system chassis contains compartments for disk modules, SPs, fan module, power supplies, and battery backup unit. The disk modules face front, and the SP(s), power supplies, battery backup unit, and fan module are accessible from the back.
modules A0, B0, C0, D0, and E0 (array 0)
modules A1, B1, C1, D1, and E1 (array 1)
modules A2, B2, C2, D2, and E2 (array 2)
modules A3, B3, C3, D3, and E3 (array 3)
Figure 1-7 diagrams this placement. Individual disk modules have disk position labels attached.
Through the SP, the SCSI–2 bus is split into five internal fast/narrow SCSI buses—A, B, C, D, and E—that connect the slots for the disk modules. For example, internal bus A connects the modules in slots A0, A1, A2, and A3, in that order. Figure 1-8 diagrams this configuration.
A disk module, also called a disk drive module, consists of a disk drive, a power regulator board, internal cabling, and a plastic carrier. The carrier has a handle for inserting and removing the module. Figure 1-9 indicates disk modules in the Challenge RAID chassis and their status lights.
Three status lights on the module indicate the following:
A label attached to the carrier's side shows the disk module's model number and capacity. You can also determine the capacity of a disk module and other features using the command-line interface; see Chapter 3 in this guide.
The Challenge RAID storage system hardware implements data availability and performance enhancements in these ways:
enhanced performance: disk striping
enhanced performance: storage system caching
data reconstruction and rebuilding after disk module failure
This section discusses these features.
RAID technology provides redundant disk resources in disk-array and disk-mirror configurations that make the storage system more highly available. Data redundancy varies for the different RAID levels supported by Challenge RAID: RAID-0, RAID-1, RAID-1_0, RAID-3, and RAID-5.
A RAID-3 and RAID–5 group maintains parity data that lets the disk group survive a disk module failure without losing data. In addition, the group can survive a single SCSI–2 internal bus failure if each disk module in the group was bound on an independent SCSI–2 internal bus.
A RAID-1 mirrored pair, or a RAID-1_0 group, which uses RAID-1 technology, duplicates data on two groups of disk modules. If one disk module fails, the other module provides continuing access to stored information. Similarly, a RAID–1 mirrored pair or RAID–1_0 group can survive a single SCSI internal bus failure if you bind each disk module on an independent SCSI internal bus.
In disk striping, the SP lays out data records, usually large data records or a number of small records for the same application, across multiple disks. For most applications, these disks can be written to or read from simultaneously and independently. Because multiple sets of read/write heads work on the same task at once, disk striping can enhance performance.
The amount of information read from or written to each module makes up the stripe element size (for example, 128 sectors). The stripe size is the number of data disks in a group multiplied by the stripe element size. For example, assume a stripe element size of 128 sectors (the default). If the RAID-5 group has five disks (four data disks and one parity disk), multiply by 4 the stripe element size of 128 to yield a stripe size of 512 sectors.
Caching is available for Challenge RAID storage systems that have two SPs, each with at least 8 MB of memory, a battery backup unit, and disk modules in slots A0 through E0. With storage system caching enabled, each SP temporarily stores requested information in its memory.
Caching can save time in two ways:
For a read request, if data is sought after the request is already in the read cache, the storage system avoids accessing the disk group to retrieve the data.
For a write request, if the information in the write cache is modified by the request and thus must be written to disk, the SP can keep the modified data in the cache and write it back to disk at the most expedient time instead of immediately. Write caching, in particular, can enhance storage system performance by reducing write time response.
To ensure data integrity, each SP maintains a mirror image of the other SP's caches. If one SP fails, the data in its caches is available from the other SP.
As explained in Chapter 7, “Caching,” you can enable storage system caching and specify basic cache parameters, and enable or disable read and write caches for individual disk units.
|Note: The optional battery backup unit must be present in the Challenge RAID chassis for systems using cache to ensure that data is committed to disk in the event of a power failure.|
All RAID levels except RAID-0 provide data redundancy: the storage system reads and writes data from and to more than one disk at a time. Also, the system software writes parity information that lets the array continue operating if a disk module fails. When a disk module in one of these RAID levels fails, the data is still available because the SP can reconstruct it from the surviving disk(s) in the array.
a hot spare (dedicated replacement disk module) is available
the failed disk module is replaced with a new disk module
If a disk module has been configured (bound) as a hot spare, it is available as a replacement for a failed disk module. (See “RAID Hot Spare” later in this chapter.) When a disk module in any RAID level except RAID-0 fails, the SP automatically writes to the hot spare and rebuilds the group using the information stored on the surviving disks. Performance is degraded while the SP rebuilds the data and parity on the new module. However, the storage system continues to function, giving users access to all data, including data stored on the failed module.
Similarly, when a new disk module is inserted to replace a failed one, the SP automatically writes to it and rebuilds the group using the information stored on the surviving disks. As for the hot spare, performance is degraded during rebuilding, but data is accessible.
The length of the rebuild period, during which the SP re-creates the second image after a failure, can be specified when RAID levels are set and disks are bound into RAID units. These processes are explained in “Binding Disks Into RAID Units” in Chapter 4.
The Challenge RAID system supports these levels of RAID:
RAID-0 group: nonredundant array
RAID-1 group: mirrored pair
RAID-1_0 group: mirrored RAID-0 group
RAID-3 group: parallel access array
RAID-5 group: individual access array
individual disk unit
|Caution: Use only Challenge RAID disk modules to replace failed disk modules. Challenge RAID disk modules contain proprietary firmware that the storage system requires for correct functioning. Using any other disks, including those from other Silicon Graphics systems, can cause failure of the storage system. Swapping disk modules within a Challenge RAID storage system is also not recommended, particularly disk modules in slots A0, B0, C0, and A3, which contain the licensed internal code, and those in slots D0 and E0, which serve with A0, B0, and C0 as the storage system cache vault.|
Chapter 4 provides detailed instructions on configuring all RAID levels.
Three to sixteen disk modules can be bound as a RAID–0 group. A RAID–0 group uses striping; see “Enhanced Performance: Disk Striping,” earlier in this chapter. You might choose a RAID–0 group configuration when fast access is more important than high availability. On IRIX 5.3 with XFS you can software-mirror the RAID–0 group to provide high availability.
|Caution: The hardware does not maintain parity information on any disk module for RAID–0 the way it does for other RAID levels. Failure of a disk module in this RAID level results in loss of data.|
In the RAID-1 configuration, two disk modules can be bound as a mirrored pair. In this disk configuration, the SP duplicates (mirrors) the data records and stores them separately on each disk module in the pair. The disks in a RAID–1 pair cannot be split into individual units (as can a software mirror composed of two individual disk units).
Features of this RAID level include
automatic mirroring: no commands are required to initiate it
physical separation of images
faster write operation than RAID-5
With a RAID–1 mirrored pair, the storage system writes the same data to both disk modules in the mirror, as shown in Figure 1-10.
To achieve the maximum fault tolerance, configure the mirror with each disk module on a different internal SCSI bus; for example, the primary image on A0, the secondary image on B0, and so on.
A RAID-1_0 configuration mirrors a RAID-0 group, creating a primary RAID-0 image and a secondary RAID-0 image for user data. This arrangement consists of four, six, eight, ten, twelve, fourteen, or sixteen disk modules. These disk modules make up two mirror images, with each image including two to eight disk modules. A RAID–1_0 group uses striping and combines the speed advantage of RAID–0 with the redundancy advantage of mirroring.
Figure 1-11 illustrates the distribution of user data with the default stripe element size of 128 sectors (65,536 bytes) in a six-module RAID–1_0 group. Notice that the disk block addresses in the stripe proceed sequentially from the first mirrored disk modules to the second mirrored disk modules, to the third mirrored disk modules, then from the first mirrored disk modules, and so on.
A RAID–1_0 group can survive the failure of multiple disk modules, providing that one disk module in each image pair survives. Thus, for highest availability and performance, the disk modules in an image pair must be on a different SCSI bus from the disk modules in the other image pair. For example, the RAID–1_0 group shown in Figure 1-11 has three disk modules in each image of the pair.
When you bind disk modules into a RAID–1_0 group, you must select them in this order: p1, s1, p2, s2, p3, s3, and so on, with primary (p1, p2, p3) and secondary (s1, s2, s3) disk modules on separate internal SCSI buses.
A RAID-3 configuration always consists of five disk modules (four data, one parity) bound as a RAID–3 group. In a RAID–3 group, the hardware always reads from or writes to all its disk modules. A RAID–3 group uses disk striping; see “Enhanced Performance: Disk Striping,” earlier in this chapter for an explanation of this feature. RAID-3 striping has a fixed stripe size of one sector.
If the SPs in your storage system have the firmware revision 9.0 and higher and RAID agent 1.55 or higher, they are capable of “fast” RAID-3. Firmware revision 9.0 and higher divides SP memory into RAID-3 space in addition to storage-system buffer space, write cache space, and read cache space. Fast RAID-3 has specific SP memory requirements: you must allocate memory specifically for it, and then divide this memory among the RAID-3 LUNs when you bind them.
|Note: In this guide, descriptions and instructions for RAID-3 are always for this revision (fast RAID-3) unless noted.|
For information on determining revision levels, see “Getting Device Names With getagent” or “Viewing SP Status Information” in Chapter 3. Chapter 4 provides detailed information on memory requirements for fast RAID-3. For information on the differences between RAID-3 and fast RAID-3, see Appendix D, “RAID-3 and Fast RAID-3.”
The Challenge RAID storage system writes parity information that lets the group continue operating if one of the disk modules fails. When you replace the failed module, the SP can rebuild the group using the information stored on the working disks. Performance is degraded while the SP rebuilds the data or parity on the new module. However, the storage system continues to function and gives users access to all data, including data that had been stored on the failed module.
In a RAID–3 group, the hardware processes disk requests serially, whereas in a RAID–5 group the hardware can interleave disk requests.
In a RAID–3 group, the parity information is stored on one disk module; in a RAID–5 group, it is stored on all disks.
A RAID–3 group works well for single-task applications that use I/Os of one or more 2 KB blocks, aligned to start at disk addresses that are multiples of 2 KB from the beginning of the logical disk.
When you plan a RAID–3 group for highest availability, configure the disk modules on different internal SCSI buses (A, B, C, D, and E). Figure 1-12 illustrates user and parity data with a data block size of 2 KB within a RAID–3 group. Notice that the byte addresses proceed from the first module to the second, third, and fourth, then to the first, and so on.
The storage system performs more steps writing data to a RAID–3 group than to all the disks in a RAID–1 mirrored pair or a RAID–0 group, or to an individual disk unit. For each correctly aligned 2 KB write operation to a RAID–3 group, the storage system performs the following steps:
Calculates the parity data.
Writes the new user and parity data.
If the write is not a multiple of 2 KB or the starting disk address of the I/O does not begin at an even 2 KB boundary from the beginning of the logical disk, the storage system performs the following steps:
Reads data from the sectors being written and parity data for those sectors.
Recalculates the parity data.
Writes the new user and parity data.
This configuration usually consists of five disk modules (but can have three to sixteen) bound as a RAID–5 group. Because there are five internal SCSI-2 buses in the Challenge RAID system, an array of five disk modules (or fewer) provides the greatest level of data redundancy.
A RAID–5 group maintains parity data that lets the disk group survive a disk module failure without losing data. In addition, in Challenge RAID storage systems, the group can survive a single SCSI–2 internal bus failure if each disk module in the group was bound on an independent SCSI–2 internal bus. For highest data availability for a RAID–5 group, the disk modules making up the group should be on different SCSI internal buses (A, B, C, and so on).
With RAID-5 technology, the hardware writes parity information to each module in the array. If a module fails, the SP can reconstruct all user data from the user data and parity information on the other disk modules. After you replace a failed disk module, the SP automatically rebuilds the disk array using the information stored on the remaining modules. The rebuilt disk array contains a replica of the information it would have contained had the disk module never failed.
A RAID–5 group uses disk striping; see “Enhanced Performance: Disk Striping,” earlier in this chapter for an explanation of this feature. Figure 1-13 illustrates user and parity data with the default stripe element size of 128 sectors (65,536 bytes) in a five–module RAID–5 group. The stripe size comprises all stripe elements. Notice that the disk block addresses in the stripe proceed sequentially from the first module to the second, third, fourth, and fifth, then back to the first, and so on.
For each write operation to a RAID–5 group, the Challenge RAID storage system must perform the following steps:
An individual disk unit is a disk module bound to be independent of any other disk module in the cabinet. An individual unit has no inherent high availability, but you can make it highly available by software-mirroring it with another individual unit, preferably one on a different internal SCSI bus.
A hot spare is a dedicated replacement disk unit on which users cannot store information. The capacity of a disk module that you bind as a hot spare must be at least as great as the capacity of the largest disk module it might replace.
|Note: The hot spare is not available for RAID-0, because this RAID level does not provide data redundancy.|
If any disk in a RAID–5, RAID-3, or RAID–1_0 group or in a RAID–1 mirrored pair fails, the SP automatically begins rebuilding the failed disk module's structure on the hot spare. When the SP finishes rebuilding, the disk group functions as usual, using the hot spare instead of the failed disk. When you replace the failed disk, the SP starts copying the data from the former hot spare onto the replacement disk. When the copy is done, the disk group consists of disk modules in the original slots, and the SP automatically frees the hot spare to serve as a hot spare again.
|Note: The SP finishes rebuilding the disk module onto the hot spare before it begins copying data to the new installed disk, even if you replace the failed disk during the rebuild process.|
A hot spare is most useful when you need the highest data availability. It eliminates the time and effort needed for someone to notice that a module has failed, find a suitable replacement module, and insert it.
You can have one or more hot spares per storage system. Any module in the storage system can be configured as a hot spare except for modules A0, B0, C0, D0, E0, and A3, which serve other purposes (A0, B0, C0, and A3 can store the licensed internal code and A0, B0, C0, D0, and E0 can serve as the storage system's cache vault).
For example, assume that the modules in slots A0-E0 are a RAID–5 group, those in slots A1 and B1 are a RAID–1 mirrored pair, and the module in A2 is a hot spare, as shown in Figure 1-14. If module D0 fails, the SP immediately begins rebuilding the RAID–5 group using the hot spare. When it finishes, the RAID–5 group consists of disk modules A0, B0, C0, A2, and E0.
When you replace the failed module in D0, the SP starts copying the structure on A2 to D0. When it finishes, the RAID–5 group once again consists of modules A0-E0 and the hot spare becomes available for use if any other module fails. A similar sequence would occur if, for example, module A1 in the mirrored pair failed.
Run the command-line interface, /usr/raid5/raidcli, in an IRIX window on your Challenge server to
bind (group) or unbind physical disks into a RAID-0, RAID-1, RAID-1_0, RAID-3, or RAID-5 unit, individual disk unit, or hot spare
change parameters on a currently bound group (logical unit number, or LUN)
get names of devices controlled by the SP
change or get information about the caching environment
get information about the SP's configuration
get information on all CRUs (customer-replaceable units, that is, components other than disk modules)
display status information on disks
display the SP log
display information about a group of disks
perform housekeeping operations, such as clearing the error log or updating firmware
The relevant parameters of the command-line interface are explained for each task in the rest of this guide. Appendix B is a complete guide to the command-line interface.
The RAID graphical user interface, RAID5GUI, is provided for managing multiple Challenge RAID disk-array storage systems connected to multiple servers on a network. You manage the storage systems using a graphics console on a host, called a client, anywhere on the network. You can manage storage systems on different servers from the same client or the same storage system from different clients. A single host can be both a client and a server, just a client, or just a server.
RAID5GUI's features include
remote configuration of the storage systems
notification of the status of the storage systems
remote management of the storage systems
RAID5GUI simplifies the management of storage-system chassis configuration, and fault and problem monitoring:
Chassis configuration management
Use RAID5GUI to configure the storage-system chassis on local and remote servers. You can bind disk modules into physical disk units (also called LUNs) and change the user-accessible parameters of LUNs, such as their rebuild time, and storage-control processor (SP) owner. You can also set up storage-system caching.
Fault and problem monitoring
To determine the status of the storage-system chassis under its management, the RAID graphical user interface periodically polls the chassis and gathers any alarms or events that are generated. When the chassis is operating normally, the chassis icon contains a drawing of an intact chassis. When a failure occurs in a chassis, the chassis icon changes or an error popup message appears. You can convert the chassis icon to a window to determine which LUN or component failed.
A client system must run RAID5GUI; a server system must run the storage-system agent, hereafter called the agent. The agent and RAID5GUI communicate with each other over a network using remote procedure calls (RPCs). The agent and the firmware in the storage system communicate with each other over a SCSI bus. Figure 1-15 diagrams this architecture.
 University of California, Berkeley, Report No. UCB/CSD/87/391.