Chapter 5. Maintaining Disk Modules

This chapter describes

Identifying and Verifying a Failed Disk Module

If you have determined that a module has failed by examining the cabinet fault light or by using the raidcli getdisk or raidcli getcrus command, as explained in Chapter 3 in this guide, you can replace the defective module and rebuild your data without powering off the Challenge RAID storage system or interrupting user applications.


Caution: Removing the wrong drive module can introduce an additional fault that shuts down the physical disk containing the failed module. Before removing a disk module, verify that the suspected module has actually failed.

The fault indicator on a disk module does not necessarily mean that the drive itself has failed. Failure of a SCSI bus, for example, lights the fault indicator on each disk module on that bus.

To verify a suspected disk–module failure, follow these steps:

  1. Look for the module with its amber fault light on. Figure 5-1 shows the fault indicator light and other lights on a disk module.

    Figure 5-1. Disk Module Status Lights


  2. Determine the failed module's ID; use Figure 5-2.


Caution: Use only Challenge RAID disk modules to replace failed disk modules. Order them from the Silicon Graphics hotline: 1-800-800-4SGI (1-800-800-4744). Challenge RAID disk modules contain proprietary firmware that the storage system requires for correct functioning. Using any other disks, including those from other Silicon Graphics systems, can cause failure of the storage system.

Figure 5-2. Disk Module Locations


  1. If you have not already checked the module status with raidcli getdisk, do so now; see “Getting Information About Disk Modules” in Chapter 3.

  2. If you have not already checked the unsolicited error log for a message about the disk module, as explained in “Displaying the Challenge RAID Unsolicited Event Log” in Chapter 3, do so now.

    A message about the disk module contains its module ID (such as A0 or B3). Check for any other messages that indicate a related failure, such as failure of a SCSI bus or a general shutdown of a chassis, that might mean the disk module itself has not failed.


    Note: If you are using storage system caching, the system uses modules A0, B0, C0, D0, and E0 for its cache vault. If one of these modules fails, the storage system dumps its cache image to the remaining modules in the vault; then it writes all dirty (modified) pages to disk and disables caching. The cache status changes, as indicated in the output of the raidcli getcache command. Caching remains disabled until you insert a replacement module and the storage system rebuilds the module into the physical disk unit. For information on caching, see Chapter 7.



Caution: Although you can remove a disk module without damaging the disk data, do this only when the disk module has actually failed. Never remove a disk module unless absolutely necessary, and only when you have its replacement available. Never replace more than one disk module at a time; use only correct disk modules available from Silicon Graphics, Inc.


Setting Up the Workplace for Replacing or Installing Disk Modules

Electrostatic discharge can build up on your body, and can accidentally discharge through the circuits of disk modules that you handle. To avoid this situation, follow these guidelines for setting up the work area before you replace disk modules or install additional arrays:

  • If the air in the work area is very dry, running a humidifier in the work area will help decrease the risk of electrostatic charge damage (ESD).

  • Provide enough room to work on the equipment. Clear the work site of any unnecessary materials; remove materials that naturally build up electrostatic charge, such as foam packaging, foam cups, cellophane wrappers, and similar materials.

  • The disk module is extremely sensitive to shock and vibration. Even a slight jar can severely damage it.

  • Do not remove disk modules from their antistatic packaging until the exact moment that you are ready to install them.

  • Before removing a disk module from its antistatic bag, place one hand firmly on an unpainted metal surface of the chassis, and at the same time, pick up the disk module while it is still sealed in its antistatic bag. Once you have done this, do not move around the room or contact other furnishings, personnel, or surfaces until you have installed and secured the disk module in the equipment.

  • After you remove a disk module or filler module, avoid moving away from the work site; otherwise, you may build up an electrostatic charge.

  • If you must move around the room or touch other surfaces before securing the disk module in the storage system, first place the disk module back in the antistatic bag. When you are ready again to install the disk module, repeat these procedures.

Replacing a Disk Module

This section explains

  • ordering replacement disk modules

  • unbinding the disk

  • removing the failed disk module

  • installing the replacement disk module

Ordering Replacement Disk Modules

Order replacement disk modules only from the Silicon Graphics, Inc. hotline:

1-800-800-4SGI (1-800-800-4744)

Use Table 5-1 as a guide to ordering replacement disk modules.

Table 5-1. Ordering Replacement Disk Modules

Unit

Silicon Graphics Marketing Code

Replacement 2.1 GB drive

P-S-RAID-1X2

Replacement 4.2 GB drive

P-S-RAID-1X4



Caution: Use only Challenge RAID disk modules as replacements; only they contain the correct device firmware. Other disk modules, even those from other Silicon Graphics equipment, will not work. Do not mix disk modules of different capacities within one array.


Unbinding the Disk

When you change a physical disk configuration, you change the bound configuration of a physical disk unit. Physical disk unit configuration changes when you add or remove a disk module, or physically move one or more disk modules to different slots in the chassis.


Caution: Unbinding destroys all the data on a physical disk unit. Before unbinding any physical disk unit, make a backup copy of any data on the unit that you want to retain.

You cannot use any unbound disk module until you bind it into a LUN again.

See “Unbinding LUNs in a New Challenge RAID Storage System” in Chapter 4 for detailed instructions. Use raidcli bind or RAID5GUI to configure disks, as explained in Chapter 4.

Removing a Failed Disk Module

You can replace a failed disk module while the storage system is powered on. If necessary, you can also replace a disk module that has not failed, such as a module that has reported many “soft” errors. When replacing a module that has not failed, you must do so while the storage system is powered up so that the SP knows the module is being replaced.


Caution: To maintain proper cooling in the storage system, never remove a disk module until you are ready to install a replacement. Never remove more than one disk module at a time.

To remove a disk module, follow these steps:

  1. Verify that the suspected module has actually failed.


    Caution: If you remove the wrong disk module, you introduce an additional fault that shuts down the physical disk containing the failed module. In this situation, the operating system software cannot access the physical disk until you initialize it again.


  2. Read “Setting Up the Workplace for Replacing or Installing Disk Modules,” earlier in this chapter.

  3. Locate the disk module that you want to remove; see Figure 5-2 if necessary.

  4. Position the new disk module in its antistatic packaging within reach of the storage system.

  5. If you are using an ESD wrist strap, attach its clip to the ESD bracket at the bottom of the storage system. Figure 5-3 shows where to attach the clip on a deskside storage system.

    Figure 5-3. Attaching the ESD Clip to the ESD Bracket on a Deskside Storage System


    Figure 5-4 shows where to attach the clip on a rack storage system.

    Figure 5-4. Attaching the ESD Clip to the ESD Bracket on a Rack Storage System


  6. Put the wrist band around your wrist with the metal button against your skin.

  7. Make sure the disk has stopped spinning and the heads have unloaded.

  8. Grasp the disk module by its handle and pull it part of the way out of the cabinet, as shown in Figure 5-5.

    Figure 5-5. Pulling Out a Disk Module



    Caution: Never remove more than one disk module at a time.



    Warning: When removing a disk module from an upper chassis assembly in a Challenge RAID rack system, make sure that you adequately balance the weight of the disk module.


  9. Supporting the disk module with your free hand, pull it all the way out of the cabinet, as shown in Figure 5-6.

    Figure 5-6. Removing a Disk Module



    Caution: When removed from the chassis, the disk modules are extremely sensitive to shock and vibration. Even a slight jar can severely damage them.


  10. If the label on the side of the disk module does not show the ID number for the compartment from which you removed the drive, write it on the label; for example, A1.

    For the compartment ID numbers, refer to Figure 5-2 or the slot matrix attached to the storage system when it was installed.

  11. Put the failed disk module in an antistatic bag and store it in a place where it will not be damaged.


    Caution: Before installing a replacement module, wait at least 15 seconds after removing the failed module to allow the SP time to recognize that the module has been removed. If you insert the replacement module too soon, the SP may report the replacement module as defective.


Installing a Replacement Disk Module

To install the replacement disk module, follow these steps:

  1. Touch the new disk module's antistatic packaging to discharge it and the drive module. Remove the new disk module from its packaging.


    Caution: The disk module is extremely sensitive to shock and vibration. Even a slight jar can severely damage it.


  2. On the label on the side of the disk module, write the ID number for the compartment into which the drive is going; for example, A3.

  3. Engage the disk module's rail in the chassis rail slot, as shown in Figure 5-7.

    Figure 5-7. Engaging the Disk Module Rail


  4. Engage the disk module's guide in the chassis guide slot, as shown in Figure 5-8.

    Figure 5-8. Engaging the Disk Module Guide


  5. Insert the disk module, as shown in Figure 5-9. Make sure it is completely seated in the slot.

    Figure 5-9. Inserting the Replacement Disk Module


  6. Remove and store the ESD wrist band, if you are using one.

The SP formats and checks the new module, and then begins to reconstruct the data. While rebuilding occurs, you have uninterrupted access to information on the physical disk unit.


Note: If you have replaced an unbound (database) disk module—A0, B0, C0, or A3—unmount any filesystems or partitions on the storage system and reboot the system so that the remaining unbound modules update the firmware on the replacement unbound module.

The default rebuild period is four hours.


Note: During the rebuild period, performance might degrade slightly, depending on the rebuild time specified and on I/O bus activity.

For more information on changing the default rebuild period, see “Binding Disks Into RAID Units” in Chapter 4.

Installing an Add-On Disk Module

As your organization's needs change, you may need to add to or change the configurations of your storage system's physical disk units. For example, you might want to add disk modules to unused slots or change the ownership of a physical disk unit from one SP to another.

If the storage system has unused disk module slots containing only filler modules, you can increase the available storage capacity by installing additional disk modules. Normal processing can continue while you install disk modules.

This section explains

  • ordering add-on disk modules

  • inserting the new disk module

  • creating device nodes and binding the disks

Ordering Add-On Disk Module Arrays

Call the Silicon Graphics, Inc., hotline to order add-on disk modules:

1-800-800-4SGI (1-800-800-4744)

Use Table 5-2 as a guide to ordering add-on disk module arrays.

Table 5-2. Ordering Add-On Disk Module Sets

Unit

Marketing Code

Add-on five 2.1-GB drives

P-S-RAID-5X2

Add-on five 4.2-GB drives

P-S-RAID-5X4

Base array with five 2.1-GB drives

P-S-RAID-B5X2

Base array with five 4.2-GB drives

P-S-RAID-B5X4

Replacement 2.1-GB drive

P-S-RAID-1X2

Replacement 4.2-GB drive

P-S-RAID-1X4



Caution: Use only Challenge RAID disk modules as replacements; only they contain the correct device firmware. Other disk modules, even those from other Silicon Graphics equipment, will not work. Do not mix disk modules of different capacities within one array. Do not remove disk modules from bus 0 (slots A0, B0, C0, D0, and E0) for use in other disk module positions.


Installing New Disk Modules

To install new disk modules, follow these steps:


Caution: Leave the filler modules installed until the add-on replacement modules are available. Never remove more than one disk module or disk filler module at a time.


  1. Read “Setting Up the Workplace for Replacing or Installing Disk Modules,” earlier in this chapter.

  2. Position the new disk modules in their antistatic packaging within reach of the storage system.

  3. If you are using a wrist band, attach its clip to the ESD bracket on the bottom of the storage system, as shown in Figure 5-3. Put the wrist band around your wrist with the metal button against your skin.

  4. Locate the slots where you will install the disk modules; see Figure 5-10.

    Figure 5-10. Disk Drive Locations



    Warning: For a Challenge RAID rack, although you need not complete each chassis assembly that is partially filled before installing more disk modules in the next chassis assembly, avoid making the rack top-heavy.

    Fill the slots in this order:

    • first, in this order: A0, B0, C0, D0, E0

    • next, in this order: A1, B1, C1, D1, E1

    • next, in this order: A2, B2, C2, D2, E2

    • finally, in this order: A3, B3, C3, D3, E3

  5. Grasp the filler module for the first slot and pull it out of the cabinet; set it aside for possible future use. If you cannot grasp the module, use a medium-size flat-blade screwdriver to pry it out gently.

  6. Touch the new disk module's antistatic packaging to discharge it and the drive module. Remove the new disk module from its packaging.

  7. On the label on the side of the disk module, write the ID number for the compartment into which the drive is going. You can either write the slot position on the label in the corresponding place on the matrix or make a check mark in the position to indicate the slot that the disk module occupies. Figure 5-11 shows these two ways of labeling disk module B0.

    Figure 5-11. Marking the Label for Disk Module B0


  8. Engage the disk module's rail in the chassis rail slot, as shown in Figure 5-12.


    Caution: Disk modules are extremely sensitive to shock and vibration. Even a slight jar can severely damage them.

    Figure 5-12. Engaging the Disk Module Rail


  9. Engage the disk module's guide in the chassis guide slot, as shown in Figure 5-13.

    Figure 5-13. Engaging the Disk Module Guide


  10. Insert the disk module, as shown in Figure 5-14. Make sure it is completely seated in the slot.

    Figure 5-14. Inserting a Disk Module


  11. Repeat steps 4 through 10 until all add-on modules are installed.

  12. When you are finished installing add-on modules, remove and store the ESD wrist band, if you are using one.

Creating Device Nodes and Binding the Disks

If you are adding disk modules to a storage system that already has at least one LUN configured, the SPs must be made aware of the new disks. This section explains how to accomplish this task for IRIX 5.3 systems without rebooting. Also, in a system with two storage-control SPs that are used for primary and secondary paths, both SPs must be made aware of the new disks. Finally, the new disks must be bound into LUNs.

Follow these steps:

  1. Change to the /dev directory:

    cd /dev 
    

  2. Type

    ./MAKE_VLUNS <controller-number> <target-number> 
    

    This command creates the device nodes for the new disks.

  3. Bind the newly installed modules into one or more physical disk units, as described in “Binding Disks Into RAID Units” in Chapter 4 in this guide.