This chapter describes
identifying and verifying a failed disk module
setting up the workplace
replacing a disk module
installing an add-on disk module array
If you have determined that a module has failed by examining the cabinet fault light or by using the raidcli getdisk or raidcli getcrus command, as explained in Chapter 3 in this guide, you can replace the defective module and rebuild your data without powering off the Challenge RAID storage system or interrupting user applications.
![]() | Caution: Removing the wrong drive module can introduce an additional fault that shuts down the physical disk containing the failed module. Before removing a disk module, verify that the suspected module has actually failed. |
The fault indicator on a disk module does not necessarily mean that the drive itself has failed. Failure of a SCSI bus, for example, lights the fault indicator on each disk module on that bus.
To verify a suspected disk–module failure, follow these steps:
Look for the module with its amber fault light on. Figure 5-1 shows the fault indicator light and other lights on a disk module.
Determine the failed module's ID; use Figure 5-2.
If you have not already checked the module status with raidcli getdisk, do so now; see “Getting Information About Disk Modules” in Chapter 3.
If you have not already checked the unsolicited error log for a message about the disk module, as explained in “Displaying the Challenge RAID Unsolicited Event Log” in Chapter 3, do so now.
A message about the disk module contains its module ID (such as A0 or B3). Check for any other messages that indicate a related failure, such as failure of a SCSI bus or a general shutdown of a chassis, that might mean the disk module itself has not failed.
![]() | Note: If you are using storage system caching, the system uses modules A0, B0, C0, D0, and E0 for its cache vault. If one of these modules fails, the storage system dumps its cache image to the remaining modules in the vault; then it writes all dirty (modified) pages to disk and disables caching. The cache status changes, as indicated in the output of the raidcli getcache command. Caching remains disabled until you insert a replacement module and the storage system rebuilds the module into the physical disk unit. For information on caching, see Chapter 7. |
![]() | Caution: Although you can remove a disk module without damaging the disk data, do this only when the disk module has actually failed. Never remove a disk module unless absolutely necessary, and only when you have its replacement available. Never replace more than one disk module at a time; use only correct disk modules available from Silicon Graphics, Inc. |
Electrostatic discharge can build up on your body, and can accidentally discharge through the circuits of disk modules that you handle. To avoid this situation, follow these guidelines for setting up the work area before you replace disk modules or install additional arrays:
If the air in the work area is very dry, running a humidifier in the work area will help decrease the risk of electrostatic charge damage (ESD).
Provide enough room to work on the equipment. Clear the work site of any unnecessary materials; remove materials that naturally build up electrostatic charge, such as foam packaging, foam cups, cellophane wrappers, and similar materials.
The disk module is extremely sensitive to shock and vibration. Even a slight jar can severely damage it.
Do not remove disk modules from their antistatic packaging until the exact moment that you are ready to install them.
Before removing a disk module from its antistatic bag, place one hand firmly on an unpainted metal surface of the chassis, and at the same time, pick up the disk module while it is still sealed in its antistatic bag. Once you have done this, do not move around the room or contact other furnishings, personnel, or surfaces until you have installed and secured the disk module in the equipment.
After you remove a disk module or filler module, avoid moving away from the work site; otherwise, you may build up an electrostatic charge.
If you must move around the room or touch other surfaces before securing the disk module in the storage system, first place the disk module back in the antistatic bag. When you are ready again to install the disk module, repeat these procedures.
This section explains
ordering replacement disk modules
unbinding the disk
removing the failed disk module
installing the replacement disk module
Order replacement disk modules only from the Silicon Graphics, Inc. hotline:
1-800-800-4SGI (1-800-800-4744)
Use Table 5-1 as a guide to ordering replacement disk modules.
Table 5-1. Ordering Replacement Disk Modules
Unit | Silicon Graphics Marketing Code |
---|---|
Replacement 2.1 GB drive | P-S-RAID-1X2 |
Replacement 4.2 GB drive | P-S-RAID-1X4 |
![]() | Caution: Use only Challenge RAID disk modules as replacements; only they contain the correct device firmware. Other disk modules, even those from other Silicon Graphics equipment, will not work. Do not mix disk modules of different capacities within one array. |
When you change a physical disk configuration, you change the bound configuration of a physical disk unit. Physical disk unit configuration changes when you add or remove a disk module, or physically move one or more disk modules to different slots in the chassis.
![]() | Caution: Unbinding destroys all the data on a physical disk unit. Before unbinding any physical disk unit, make a backup copy of any data on the unit that you want to retain. |
You cannot use any unbound disk module until you bind it into a LUN again.
See “Unbinding LUNs in a New Challenge RAID Storage System” in Chapter 4 for detailed instructions. Use raidcli bind or RAID5GUI to configure disks, as explained in Chapter 4.
You can replace a failed disk module while the storage system is powered on. If necessary, you can also replace a disk module that has not failed, such as a module that has reported many “soft” errors. When replacing a module that has not failed, you must do so while the storage system is powered up so that the SP knows the module is being replaced.
![]() | Caution: To maintain proper cooling in the storage system, never remove a disk module until you are ready to install a replacement. Never remove more than one disk module at a time. |
To remove a disk module, follow these steps:
Verify that the suspected module has actually failed.
![]() | Caution: If you remove the wrong disk module, you introduce an additional fault that shuts down the physical disk containing the failed module. In this situation, the operating system software cannot access the physical disk until you initialize it again. |
Read “Setting Up the Workplace for Replacing or Installing Disk Modules,” earlier in this chapter.
Locate the disk module that you want to remove; see Figure 5-2 if necessary.
Position the new disk module in its antistatic packaging within reach of the storage system.
If you are using an ESD wrist strap, attach its clip to the ESD bracket at the bottom of the storage system. Figure 5-3 shows where to attach the clip on a deskside storage system.
Figure 5-4 shows where to attach the clip on a rack storage system.
Put the wrist band around your wrist with the metal button against your skin.
Make sure the disk has stopped spinning and the heads have unloaded.
Grasp the disk module by its handle and pull it part of the way out of the cabinet, as shown in Figure 5-5.
![]() | Caution: Never remove more than one disk module at a time. |
![]() | Warning: When removing a disk module from an upper chassis assembly in a Challenge RAID rack system, make sure that you adequately balance the weight of the disk module. |
Supporting the disk module with your free hand, pull it all the way out of the cabinet, as shown in Figure 5-6.
![]() | Caution: When removed from the chassis, the disk modules are extremely sensitive to shock and vibration. Even a slight jar can severely damage them. |
If the label on the side of the disk module does not show the ID number for the compartment from which you removed the drive, write it on the label; for example, A1.
For the compartment ID numbers, refer to Figure 5-2 or the slot matrix attached to the storage system when it was installed.
Put the failed disk module in an antistatic bag and store it in a place where it will not be damaged.
![]() | Caution: Before installing a replacement module, wait at least 15 seconds after removing the failed module to allow the SP time to recognize that the module has been removed. If you insert the replacement module too soon, the SP may report the replacement module as defective. |
To install the replacement disk module, follow these steps:
Touch the new disk module's antistatic packaging to discharge it and the drive module. Remove the new disk module from its packaging.
![]() | Caution: The disk module is extremely sensitive to shock and vibration. Even a slight jar can severely damage it. |
On the label on the side of the disk module, write the ID number for the compartment into which the drive is going; for example, A3.
Engage the disk module's rail in the chassis rail slot, as shown in Figure 5-7.
Engage the disk module's guide in the chassis guide slot, as shown in Figure 5-8.
Insert the disk module, as shown in Figure 5-9. Make sure it is completely seated in the slot.
Remove and store the ESD wrist band, if you are using one.
The SP formats and checks the new module, and then begins to reconstruct the data. While rebuilding occurs, you have uninterrupted access to information on the physical disk unit.
![]() | Note: If you have replaced an unbound (database) disk module—A0, B0, C0, or A3—unmount any filesystems or partitions on the storage system and reboot the system so that the remaining unbound modules update the firmware on the replacement unbound module. |
The default rebuild period is four hours.
![]() | Note: During the rebuild period, performance might degrade slightly, depending on the rebuild time specified and on I/O bus activity. |
For more information on changing the default rebuild period, see “Binding Disks Into RAID Units” in Chapter 4.
As your organization's needs change, you may need to add to or change the configurations of your storage system's physical disk units. For example, you might want to add disk modules to unused slots or change the ownership of a physical disk unit from one SP to another.
If the storage system has unused disk module slots containing only filler modules, you can increase the available storage capacity by installing additional disk modules. Normal processing can continue while you install disk modules.
This section explains
ordering add-on disk modules
inserting the new disk module
creating device nodes and binding the disks
Call the Silicon Graphics, Inc., hotline to order add-on disk modules:
1-800-800-4SGI (1-800-800-4744)
Use Table 5-2 as a guide to ordering add-on disk module arrays.
Table 5-2. Ordering Add-On Disk Module Sets
Unit | Marketing Code |
---|---|
Add-on five 2.1-GB drives | P-S-RAID-5X2 |
Add-on five 4.2-GB drives | P-S-RAID-5X4 |
Base array with five 2.1-GB drives | P-S-RAID-B5X2 |
Base array with five 4.2-GB drives | P-S-RAID-B5X4 |
Replacement 2.1-GB drive | P-S-RAID-1X2 |
Replacement 4.2-GB drive | P-S-RAID-1X4 |
To install new disk modules, follow these steps:
![]() | Caution: Leave the filler modules installed until the add-on replacement modules are available. Never remove more than one disk module or disk filler module at a time. |
Read “Setting Up the Workplace for Replacing or Installing Disk Modules,” earlier in this chapter.
Position the new disk modules in their antistatic packaging within reach of the storage system.
If you are using a wrist band, attach its clip to the ESD bracket on the bottom of the storage system, as shown in Figure 5-3. Put the wrist band around your wrist with the metal button against your skin.
Locate the slots where you will install the disk modules; see Figure 5-10.
![]() | Warning: For a Challenge RAID rack, although you need not complete each chassis assembly that is partially filled before installing more disk modules in the next chassis assembly, avoid making the rack top-heavy. |
Fill the slots in this order:
Grasp the filler module for the first slot and pull it out of the cabinet; set it aside for possible future use. If you cannot grasp the module, use a medium-size flat-blade screwdriver to pry it out gently.
Touch the new disk module's antistatic packaging to discharge it and the drive module. Remove the new disk module from its packaging.
On the label on the side of the disk module, write the ID number for the compartment into which the drive is going. You can either write the slot position on the label in the corresponding place on the matrix or make a check mark in the position to indicate the slot that the disk module occupies. Figure 5-11 shows these two ways of labeling disk module B0.
Engage the disk module's rail in the chassis rail slot, as shown in Figure 5-12.
![]() | Caution: Disk modules are extremely sensitive to shock and vibration. Even a slight jar can severely damage them. |
Engage the disk module's guide in the chassis guide slot, as shown in Figure 5-13.
Insert the disk module, as shown in Figure 5-14. Make sure it is completely seated in the slot.
Repeat steps 4 through 10 until all add-on modules are installed.
When you are finished installing add-on modules, remove and store the ESD wrist band, if you are using one.
If you are adding disk modules to a storage system that already has at least one LUN configured, the SPs must be made aware of the new disks. This section explains how to accomplish this task for IRIX 5.3 systems without rebooting. Also, in a system with two storage-control SPs that are used for primary and secondary paths, both SPs must be made aware of the new disks. Finally, the new disks must be bound into LUNs.
Follow these steps:
Change to the /dev directory:
cd /dev |
Type
./MAKE_VLUNS <controller-number> <target-number> |
This command creates the device nodes for the new disks.
Bind the newly installed modules into one or more physical disk units, as described in “Binding Disks Into RAID Units” in Chapter 4 in this guide.