Chapter 1. Fibre Channel RAID Graphical User Interface Features

The Silicon Graphics Fibre Channel administrative software provides a graphical user interface, ssmgui, that lets you configure and manage Silicon Graphics Fibre Channel RAID disk-array storage systems from a physical component viewpoint. Using ssmgui, you can group a Fibre Channel RAID storage system's physical disks into logical units (LUNs[1] ), monitor the status on the physical disks and other components that make up a storage system, and perform other administrative tasks.


Note: You can also communicate with RAID storage with a command-line interface, explained in Chapter 6, “The Fibre Channel RAID Command-Line Interface.” For communication with non-RAID (JBOD) storage, use a different command-line interface, which is explained in Chapter 7, “The Fibre Channel Non-RAID Command-Line Interface.” No GUI is provided for communicating with non-RAID storage.

This chapter contains these sections:

ssmgui Architecture

The ssmgui graphical user interface program and the ssmcli command-line interface communicate with a dedicated agent, ssmagent, on the same server or on other servers on the network. The management station and the remote agents communicate with each other over a TCP/IP network. The ssmagent communicates with the licensed internal code running in an array's storage processors (SPs). Figure 1-1 diagrams Fibre Channel administrative software architecture with a management station that is also an array server.

Figure 1-1. ssmgui Architecture: Management Station That Is Also an Array Server



Note: In the chapters describing the RAID graphical user interface, ssmgui and RAID GUI are synonymous, although ssmgui generally denotes the program as a whole and RAID GUI denotes the graphical user interface itself.

The licensed internal code includes background verify and sniff processes and reporting. This array-resident software process runs continuously in the background, scans all data, and repairs soft error s and data inconsistencies before these anomalies can become compounded and unrecoverable. This software is transparent to user applications and affects performance by less than 1%. All corrective actions are logged in the SP event log (described in “Displaying an SP Event Log” in Chapter 4).

Starting a RAID GUI Session

This section contains the following topics:

Checking ssmagent Installation

To confirm that the Fibre Channel RAID ssmagent is installed on the host system, enter the following at the command line:

chkconfig -f ssm on

If the software is not available on the host, install it using the information in the release notes and the CD included with the Fibre Channel RAID option.

ssmagent Users and Sessions

Any user can run a RAID GUI session from any management station on which it is installed, and can use it to monitor the storage system. Only authorized users can use this program to configure or reconfigure a storage system. A user is authorized if the file /etc/config/ssmagent.config on the server contains an entry for the user. For information, consult the man page (reference page) ssmagent(7M).

ssmagent allows more than one RAID GUI session to access the same storage system at the same time. Make sure that two authorized users do not use the RAID GUI to configure or reconfigure the same array enclosure at the same time. To determine whether another user is using ssmgui, enter

ps -ef | grep ssmgui 

The output does not inform you if the person is trying to manage or configure a specific array.

Configuring the Path to ssm Executables in .cshrc

The following steps configure the host system's .cshrc file to execute ssmgui commands from any directory on the host. If this path is not set, you must be in the /usr/ssm/lib directory and enter the complete path /usr/ssm/bin/ssmgui each time you want to launch ssmgui.

Use the following information to set the proper path on the host system:

  1. As superuser, use your favorite line editor to open the .cshrc file.

  2. Enter the following in the .cshrc file:

    set path = ($path /usr/ssm/bin /usr/ssm/lib) 
    

  3. Write and quit the .cshrc file.

  4. Enter the following command sequence to reinitialize the .cshrc file:

    source .cshrc 
    

Starting the Fibre Channel RAID Agent ssmagent

Follow these steps:

  1. Check for the presence of the agent, as explained in “Checking ssmagent Installation”.

  2. Make sure your username is entered in /etc/config/ssmagent.config, as explained in “ssmagent Users and Sessions”.

  3. If desired, configure the host system's .cshrc file for convenience in using ssmgui, as explained in “Configuring the Path to ssm Executables in .cshrc”.

  4. Make sure that the host system on which the agent is to run has the correct date and time; enter

    date 
    

    If the date and time are not correct, reset them; see this command's man page, date(1M).

  5. Start ssmagent by entering

    /etc/init.d/ssm start 
    

Depending on the number of RAID arrays you administer, it takes the agent from 30 seconds to several minutes to finish launching arrays and become ready to administer them. You can check whether the agent has started up by looking in /var/adm/SYSLOG for this string:

Agent has started up

This message indicates that the agent has made connection to all necessary drivers.


Note: To stop the Fibre Channel RAID agent, enter


/etc/init.d/ssm stop 

Starting the RAID GUI

The agent ssmagent must be operating before you start the RAID GUI; see “Starting the Fibre Channel RAID Agent ssmagent”.

To start the RAID GUI, enter

ssmgui & 

If you have not configured a path to ssm executables as explained in “Configuring the Path to ssm Executables in .cshrc”, use

/usr/ssm/bin/ssmgui & 

After displaying a preliminary window, the RAID GUI looks first in your home directory for a ssmhosts file that contains the hostnames of servers with arrays to manage. If such a file exists, ssmgui tries to extract hostnames from this file. If it succeeds, it opens the Storage System Manager window, as shown in Figure 1-2.

If ssmgui cannot extract the hostnames, it opens the Host Administration dialog, in which you can enter the hostnames of servers whose arrays you want to manage using the RAID GUI. This process is explained in “Using the Host Administration Window to Add Servers” in Chapter 2.

Storage System Manager Window

The main RAID GUI window is the Storage System Manager window. Figure 1-2 points out the main features of this window.

Figure 1-2. Storage System Manager Window


From this window, you can specify arrays to manage, bind standard LUNs, display other information windows on arrays, and display the SP event log. These functions are explained in subsequent chapters of this guide.

In the array selection area, the software displays an icon for each array that it finds connected to each server whose hostname appears in the ssmhosts file. It creates only one icon for each array, regardless of how many servers are connected to the array.

Main features of this window are as follows:

The Storage System Manager window has many standard GUI features, such as

  • status bar, which displays GUI and system messages

  • button descriptions that you display by positioning the cursor on a button for a few seconds without clicking the button (the description is also displayed in the status bar)

  • Help menu for displaying online help and information about ssmgui


    Note: To print information from a Help browser window, create a document in an ASCII text editor (such as vi), copy the information, and print the document.


  • View menu for hiding or displaying toolbars, and for other viewing options

  • horizontal and vertical scroll bars for various windows and menus

Exiting a RAID GUI Session

In the File menu of the Storage System manager window (and certain other windows), click Exit to exit the ssmgui session. In the confirmation window that opens, click OK. All ssmgui windows close.


Note: When you exit, all changes you set in all windows are saved to $HOME/.vgalaxy.1.vr.


Array Configuration and Management

Note the following terms:

  • A storage system managed by the administrative software is a managed array.

  • A host with a managed array is an array server or a server.

  • The host running the administrative software is a management station (this host can also be a server with managed arrays.

To configure an array using the RAID GUI, you follow these basic steps:

  1. Determine the arrays you want to manage.

  2. Make sure the RAID GUI agent ssmagent is running on all servers connected to arrays you want to manage, as explained in “Checking ssmagent Installation” in this chapter.

  3. Start the RAID GUI (ssmgui), as explained in “Starting a RAID GUI Session” in this chapter.

  4. In the RAID GUI, set up array memory and create LUNs, as explained in Chapter 2, “Using the RAID GUI to Configure Arrays.”

  5. Make the LUNs available to the server's operating system, as explained in “Making LUNs Available to the Server Operating System” in Chapter 2.

To manage the array, you follow these basic steps:

  1. Monitor the health of the array, as explained in Chapter 4, “Monitoring Arrays and Displaying System Statistics.”

  2. Display system status and statistics, as explained in Chapter 4, “Monitoring Arrays and Displaying System Statistics.”

  3. Reconfigure the array, if necessary, as explained in Chapter 3, “Reconfiguring and Fine-Tuning.”

  4. Deal with failed components if necessary, as explained in Chapter 5, “Identifying and Correcting Failures.”

Data Availability and Performance

The RAID storage system hardware implements data availability and performance enhancements. This section discusses the following topics:

For information on RAID levels supported for IRIS FailSafe, ask your Silicon Graphics Sales Engineer.

Data Redundancy

RAID technology provides redundant disk resources in disk-array and disk-mirror configurations that make the storage system more highly available. Data redundancy varies for the different RAID types (levels) supported by Silicon Graphics Fibre Channel RAID storage systems: RAID 0, RAID 1, RAID 1/0, RAID 3, and RAID 5.

A RAID 3 and RAID 5 group maintain parity data that lets the disk group survive a disk module failure without losing data. In addition, the group can survive a single fibre loop failure.

A RAID 1 mirrored pair, or a RAID 1/0 group, which uses RAID 1 technology, duplicates data on two groups of disk modules. If one disk module fails, the other module provides continuing access to stored information. Similarly, a RAID 1 mirrored pair or RAID 1/0 group can survive a single fibre loop failure.

Enhanced Performance: Disk Striping

In disk striping, the SP lays out data records, usually large data records or a number of small records for the same application, across multiple disks. For most applications, these disks can be written to or read from simultaneously and independently. Because multiple sets of read/write heads work on the same task at once, disk striping can enhance performance.

The amount of information read from or written to each module makes up the stripe element size (for example, 128 sectors). The stripe size is the number of data disks in a group multiplied by the stripe element size. For example, assume a stripe element size of 128 sectors (the default). If the RAID 5 group has five disks (four data disks and one parity disk), multiply by 4 the stripe element size of 128 to yield a stripe size of 512 sectors.

Enhanced Performance: Storage System Caching

Caching is available for Fibre Channel RAID storage systems that have two SPs, each with at least 16 MB of read cache memory, 16 MB of write cache memory, two power supplies in the Fibre Channel RAID enclosure (DPE), and a fully charged standby power supply (SPS), and disk modules in slots 00 through 08. With storage system caching enabled, each SP temporarily stores requested information in its memory.

Caching can save time in two ways:

  • For a read request, if data is sought after the request is already in the read cache, the storage system avoids accessing the disk group to retrieve the data.

  • For a write request, if the information in the write cache is modified by the request and thus must be written to disk, the SP can keep the modified data in the cache and write it back to disk at the most expedient time instead of immediately. Write caching, in particular, can enhance storage system performance by reducing write time response.

To ensure data integrity, each SP maintains a mirror image of the other SP's write caches If one SP fails, the data in its write cache is available from the other SP.

As explained in Chapter 2, “Using the RAID GUI to Configure Arrays,” and Chapter 3, “Reconfiguring and Fine-Tuning,” you can enable storage system caching and specify basic cache parameters, and enable or disable read and write caches for individual disk units.


Note: The SPS must be fully functional in the Fibre Channel RAID chassis for systems using write cache to ensure that data is committed to disk in the event of an AC input power failure.


Data Reconstruction and Rebuilding After Disk Module Failure

All RAID types except RAID 0 provide data redundancy: the storage system reads and writes data from and to more than one disk at a time. Also, the system software writes parity information that lets the array continue operating if a disk module fails. When a disk module in one of these RAID types fails, the data is still available because the SP can reconstruct it from the surviving disk(s) in the array.

Data rebuilding occurs under these conditions:

  • A hot spare (dedicated replacement disk module) is available.

  • The failed disk module is replaced with a new disk module.

If a disk module has been configured (bound) as a hot spare, it is available as a replacement for a failed disk module. (See “RAID Hot Spare”.) When a disk module in any RAID type except RAID 0 fails, the SP automatically writes to the hot spare and rebuilds the group using the information stored on the surviving disks. Performance is degraded while the SP rebuilds the data and parity on the new module. However, the storage system continues to function, giving users access to all data, including data stored on the failed module.

Similarly, when a new disk module is inserted to replace a failed one, the SP automatically writes to it and rebuilds the group using the information stored on the surviving disks. As for the hot spare, performance is degraded during rebuilding, but data is accessible.

The length of the rebuild period, during which the SP recreates the second image after a failure, can be specified when RAID types are set and disks are bound into RAID units. These processes are explained in Chapter 2, “Using the RAID GUI to Configure Arrays.”.

RAID Types

The Fibre Channel RAID system supports these levels of RAID:

  • RAID 0 group: nonredundant array

  • RAID 1 group: mirrored pair

  • RAID 1/0 group: mirrored RAID 0 group

  • RAID 3 group: parallel access array

  • RAID 5 group: individual access array

  • individual disk unit


Caution: Use only Fibre Channel RAID disk modules to replace failed disk modules. These disk modules contain proprietary firmware that the storage system requires for correct functioning. Using any other disks, including those from other Silicon Graphics systems, can cause failure of the storage system. Swapping disk modules within a Fibre Channel RAID storage system is also not recommended, particularly disk modules in slots 00, 01, and 02, which contain the licensed internal code. or slots 00 through 08, which serve as the storage system cache vault.

Chapter 2 provides detailed instructions on configuring all RAID types.

RAID 0 Group: Nonredundant Array

Three to sixteen disk modules can be bound as a RAID 0 group. A RAID 0 group uses striping; see “Enhanced Performance: Disk Striping,” earlier in this chapter. You might choose a RAID 0 group configuration when fast access is more important than high availability. You can software-mirror the RAID 0 group to provide high availability; see the section on using xlv make to create volume objects in Getting Started With XFS Filesystems.


Caution: The hardware does not maintain parity information on any disk module for RAID 0 the way it does for other RAID types. Failure of a disk module in this RAID type results in loss of data.


RAID 1/0 Group: Mirrored RAID 0 Group

A RAID 1/0 configuration mirrors a RAID 0 group, creating a primary RAID 0 image and a secondary RAID 0 image for user data. This arrangement consists of four (minimum), six, eight, ten, twelve, fourteen, or sixteen disk modules. These disk modules make up two mirror images, with each image including two to eight disk modules. A RAID 1/0 group uses striping and combines the speed advantage of RAID 0 with the redundancy advantage of mirroring.

Figure 1-3 illustrates the distribution of user data with the default stripe element size of 128 sectors (65,536 bytes) in a six-module RAID 1/0 group. Notice that the disk block addresses in the stripe proceed sequentially from the first mirrored disk modules to the second mirrored disk modules, to the third mirrored disk modules, then from the first mirrored disk modules, and so on.

A RAID 1/0 group can survive the failure of multiple disk modules, providing that one disk module in each image pair survives. For example, the RAID 1/0 group shown in Figure 1-3 has three disk modules in each image of the pair.

Figure 1-3. Distribution of User Data in a RAID 1/0 Group


RAID 1: Mirrored Pair

In the RAID 1 configuration, two disk modules can be bound as a mirrored pair. In this disk configuration, the SP duplicates (mirrors) the data records and stores them separately on each disk module in the pair. The disks in a RAID 1 pair cannot be split into individual units (as can a software mirror composed of two individual disk units).

Features of this RAID type include

  • fault tolerance

  • automatic mirroring: no commands are required to initiate it

  • physical separation of images

  • faster write operation than RAID 5

With a RAID 1 mirrored pair, the storage system writes the same data to both disk modules in the mirror, as shown in Figure 1-4.

Figure 1-4. RAID 1 Mirrored Pair (Hardware-Mirrored Pair)


RAID 3: Parallel Access Array

A RAID 3 configuration always consists of five (four data, one parity: 4 + 1) or nine (eight data, one parity: 8 + 1) disk modules bound as a RAID 3 group. In a RAID 3 group, the hardware always reads from or writes to all its disk modules. A RAID 3 group uses disk striping; see “Enhanced Performance: Disk Striping,” earlier in this chapter for an explanation of this feature. RAID 3 striping has a fixed stripe size of one sector.

The Fibre Channel RAID storage system writes parity information that lets the group continue operating if one of the disk modules fails. When you replace the failed module, the SP can rebuild the group using the information stored on the working disks. Performance is degraded while the SP rebuilds the data or parity on the new module. However, the storage system continues to function and gives users access to all data, including data that had been stored on the failed module.

RAID 3 differs from RAID 5 in several important ways:

  • In a RAID 3 group, the hardware processes disk requests serially, whereas in a RAID 5 group the hardware can interleave disk requests.

  • In a RAID 3 group, the parity information is stored on one disk module; in a RAID 5 group, it is stored on all disks.

A RAID 3 group works well for single-task applications that use I/Os of one or more 2-KB blocks, aligned to start at disk addresses that are multiples of 2 KB from the beginning of the logical disk.

Figure 1-5 illustrates user and parity data with a data block size of 2 KB within a RAID 3 group. Notice that the byte addresses proceed from the first module to the second, third, and fourth, then to the first, and so on.

Figure 1-5. Distribution of User and Parity Data in a RAID 3 Group


The storage system performs more steps writing data to a RAID 3 group than to all the disks in a RAID 1 mirrored pair or a RAID 0 group, or to an individual disk unit. For each correctly aligned 2 KB write operation to a RAID 3 group, the storage system performs the following steps:

  1. Calculates the parity data.

  2. Writes the new user and parity data.

If the write is not a multiple of 2 KB or the starting disk address of the I/O does not begin at an even 2 KB boundary from the beginning of the logical disk, the storage system performs the following steps:

  1. Reads data from the sectors being written and parity data for those sectors.

  2. Recalculates the parity data.

  3. Writes the new user and parity data.

RAID 5: Individual Access Array

This configuration usually consists of five disk modules (but can have three to sixteen) bound as a RAID 5 group. An array of five disk modules (or fewer) provides the greatest level of data redundancy. A RAID 5 group maintains parity data that lets the disk group survive a disk module failure without losing data.

With RAID 5 technology, the hardware writes parity information to each module in the array. If a module fails, the SP can reconstruct all user data from the user data and parity information on the other disk modules. After you replace a failed disk module, the SP automatically rebuilds the disk array using the information stored on the remaining modules. The rebuilt disk array contains a replica of the information it would have contained had the disk module never failed.

A RAID 5 group uses disk striping; see “Enhanced Performance: Disk Striping,” earlier in this chapter for an explanation of this feature. Figure 1-6 illustrates user and parity data with the default stripe element size of 128 sectors (65,536 bytes) in a five-module RAID 5 group. The stripe size comprises all stripe elements. Notice that the disk block addresses in the stripe proceed sequentially from the first module to the second, third, fourth, and fifth, then back to the first, and so on.

For each write operation to a RAID 5 group, the RAID storage system must perform the following steps:

  1. Read data from the sectors being written and parity data for those sectors.

  2. Recalculate the parity data.

  3. Write the new user and parity data.

    Figure 1-6. Distribution of User and Parity Data in a RAID 5 Group


Individual Disk Unit

An individual disk unit is a disk module bound to be independent of any other disk module in the cabinet. An individual unit has no inherent high availability, but you can make it highly available by software-mirroring it with another individual unit.

RAID Hot Spare

A hot spare is a dedicated replacement disk unit on which users cannot store information. The capacity of a disk module that you bind as a hot spare must be at least as great as the capacity of the largest disk module it might replace.


Note: The hot spare is not available for RAID 0, because this RAID type does not provide data redundancy. RAID 3 LUNs use only one hot spare, even though additional ones are available.

If any disk module in a RAID 5, RAID 3, or RAID 1/0 group or in a RAID 1 mirrored pair fails, the SP automatically begins rebuilding the failed disk module's structure on the hot spare. When the SP finishes rebuilding, the disk group functions as usual, using the hot spare instead of the failed disk. When you replace the failed disk, the SP starts copying the data from the former hot spare onto the replacement disk. When the copy is done, the disk group consists of disk modules in the original slots, and the SP automatically frees the hot spare to serve as a hot spare again.


Note: The SP finishes rebuilding the disk module onto the hot spare before it begins copying data to the new installed disk, even if you replace the failed disk during the rebuild process.

A hot spare is most useful when you need the highest data availability. It eliminates the time and effort needed for someone to notice that a module has failed, find a suitable replacement module, and insert it.

You can have one or more hot spares per storage system. Any module in the storage system can be configured as a hot spare except for disk modules in slots 00 through 08, which serve other purposes.

For example, assume that the disk modules in slots 10-14 are a RAID 5 group, those in slots 15 and 16 are a RAID 1 mirrored pair, and the module in 17 is a hot spare. If module 13 fails, the SP immediately begins rebuilding the RAID 5 group using the hot spare. When it finishes, the RAID 5 group consists of disk modules 10, 11, 12, 17, and 14.

When you replace the failed module in 13, the SP starts copying the structure on 17 to 13. When it finishes, the RAID 5 group once again consists of modules 10-14 and the hot spare becomes available for use if any other module fails. A similar sequence would occur if, for example, module 15 in the mirrored pair failed.



[1] The physical disk unit number is also known as the logical unit number, or LUN. The LUN is a logical concept, but is recognized as a physical disk unit by the operating system; hence the seemingly contradictory names.