Appendix A. Error Recovery

This appendix contains these sections;

Restarting ssmagent

You might have to restart the agent ssmagent after you take any of these actions:

  • changing an array so that communication between the front end and back end is affected; for example, attaching an array after the agent has been started

  • restarting licensed internal code (LIC)

  • partitioning RAID 3 memory

  • modifying the /etc/config/ssmagent.config file

  • rebooting, unless you have configured the system to start the agent automatically

To determine whether restarting the agent is necessary, enter

scsicontrol -i sc<controllerid>d<driveid>l<lunid>

For example:

scsicontrol -i sc6d6l0 

You do not need to restart the agent if you see an output such as

sc2d0l0:  Disk          SGI     RAID 5          0130
ANSI vers 2, ISO ver: 0, ECMA ver: 0; supports:  synch linkedcmds cmdqueing

You need to restart the agent if you see a message such as No such device or Cannot open.

To start ssmagent, enter

/etc/init.d/ssm stop 
/etc/init.d/ssm start 

Depending on the number of RAID arrays you administer, it takes the agent from 30 seconds to several minutes to finish launching and be ready to administer them.

Re-Establishing Communication With an Array

Communication with the array is lost if you have rebooted a server for which there are no LUNs bound, or if all three database disk modules are inadvertently removed at the same time. To re-establish communication with the array, follow these steps:

  1. Obtain a serial cable; see Appendix B, “Server Serial Connector Pinouts,” for the pinout for the server you are using.

  2. Connect the serial cable to a tty port on the server, such as tty2 (see the owner's guide for the server if necessary) and the serial port on the array.

  3. In /etc/config/ssmagent.config, comment out the line that specifies the device name and specify the tty port instead. For example:

    # # device specifications 
    # *close* at the end is needed for low level (devscsi) support.
    
    #device sc2d2l0  Unit-spA “Unit 8”
    ttydevice tty2
    

  4. Stop the agent:

    /etc/init.d/ssm stop 
    

  5. Restart the agent:

    /etc/init.d/ssm start 
    

  6. After you establish communication with the array, set up at least one LUN for the default SP.

  7. In /etc/config/ssmagent.config, remove the comment symbol from the line that specifies the device name and remove the line that specifies the tty port.

If the driver loses communication with the agent (for example, because of a broken cable connection between the host and the storage system, or if a host bus adapter is turned off), the driver on the host times out and logs off. To reinitialize the loop, enter

scsiha -lp controllernumber 

For example:

scsiha -lp 2

Recovering From an Incorrect Bind

To stop a bind that is in progress, use the RAID CLI rebootsp subcommand; see “rebootSP” in Chapter 6. Alternatively, you can use the GUI to remove all drives in the bind.

If a DPE has two LCCs (A and B), each DAE cabled to it must also have two LCCs. If a DAE has no LCC B, binding LUNs on SP B is restricted in some circumstances: If SP A uses disk modules 00 through 09 (that is, all the disk modules in the DPE) in LUNs, you cannot bind any LUNs on SP B, because the connection to the disk modules in the DAE is missing: there is no path through LCC B (absent) and no path through the disk modules on the DPE (all are bound on SP A).