This chapter explains the concepts of XLV logical volumes. The use of logical volumes allows one filesystem to spread across multiple disk partitions. IRIX supports XLV logical volumes, a logical volume design developed at Silicon Graphics. Older releases of IRIX supported an older logical volume design, lv logical volumes. The procedure for converting from lv logical volumes to XLV logical volumes is described in the section “Converting lv Logical Volumes to XLV Logical Volumes” in Chapter 4.
The major sections in this chapter are:
Administration procedures for XLV logical volumes are described in Chapter 4, “Creating and Administering XLV Logical Volumes”.
|Note: For information on XVM logical volume concepts, see the XVM Volume Manager Administrator's Guide.|
The use of logical volumes enables the creation of filesystems, raw devices, or block devices that span more than one disk partition. Logical volumes behave like regular disk partitions; they appear as block and character devices in the /dev directory and can be used as arguments anywhere a disk device can be specified.
Filesystems can be created, mounted, and used in the normal way on logical volumes, or logical volumes can be used as block or raw devices. XLV logical volumes provide services such as disk plexing (also known as mirroring) and striping transparently to the applications that access the volumes. Key reasons to create a logical volume are:
To allow a filesystem or disk device to be larger than the size of a physical disk
To increase disk I/O performance
The drawback to logical volumes is that all disks used in a logical volume must function correctly at all times. If you have a logical volume set up over three disks and one disk goes bad, the information on the other two disks is unavailable and must be restored from backups. However, by using the Disk Plexing Option optional software, you can create multiple copies, called plexes, of the contents of XLV logical volumes, which ensures that all of the information in an XLV logical volume is available even when a disk goes bad.
Support for very large logical volumes—up to one terabyte on 32-bit systems and unlimited on 64-bit systems.
Support for disk striping for higher I/O performance
Plexing (mirroring) for higher system and data reliability
Online volume reconfigurations, such as increasing the size of a volume, for less system downtime
With XFS filesystems, XLV provides these additional advantages:
Filesystem journal records on a separate partition, which can be on a separate disk, for maximum performance
Access to real-time data
An XLV logical volume can include partitions from several physical disk drives. By default, data is written to the first disk partition, then to the second disk partition, and so on. Figure 3-1 shows the order in which data is written to partitions in a non-striped logical volume.
On striped logical volumes, the volume must have equal-sized partitions on several disks. When logical volumes are striped, an amount of data, called the stripe unit, is written to the first disk, the next stripe unit amount of data is written to the second disk, and so on. When each of the disks have been written to, the next stripe unit of data is written to the first disk, the next stripe unit amount of data is written to the second disk, and so on to complete the “stripe.” Figure 3-2 shows the order in which data is written to a striped logical volume.
Because each stripe unit in a stripe can be read and written simultaneously, I/O performance is improved. To obtain the best performance benefits of striping, try to connect the disks you are striping across on different controllers. In this arrangement, there are independent data paths between each disk and the system. However, a small performance improvement can be obtained using SCSI disks striped on the same controller.
When XFS filesystems are used on XLV volumes, each logical volume can contain up to three subvolumes: data (which is required), log, and real-time. The data subvolume normally contains user files and filesystem metadata (inodes, indirect blocks, directories, and free space blocks). The log subvolume is used for filesystem journal records. It is called an external log. If there is no log subvolume, journal records are placed in the data subvolume (an internal log). Data with special I/O bandwidth requirements, such as video, can be placed on the optional real-time subvolume. The section “Using Real-Time Subvolumes” in Chapter 8 explains this procedure.
XLV increases system reliability and availability by enabling you to add or remove a copy of the data in the volume (a plex), increase the size of (grow) a volume, and replace failed elements of a plexed volume without taking the volume out of service.
You use one of two procedures to create an XLV logical volume, depending on whether you are starting with empty disks or with a filesystem on a disk partition. When starting with empty disks, you perform the following steps:
Create the XLV logical volume (see “Creating Volume Objects With xlv_make” in Chapter 4 and “Example 3: Creating A Plexed XLV Logical Volume for an XFS Filesystem With an External Log” in Chapter 4).
In the second procedure for creating XLV logical volumes, you start with a filesystem on a disk partition. You increase the size of the filesystem (“grow” the filesystem) by creating a logical volume that includes the existing disk partition and a new disk partition. This procedure is explained in “Growing an XFS Filesystem Onto Another Disk” in Chapter 6.
Converting from lv logical volumes to XLV logical volumes is easy. Using the commands lv_to_xlv and xlv_make, you can convert lv logical volumes to XLV without having to dump and restore your data.
Using XLV logical volumes is not recommended on systems with a single disk.
XLV logical volumes are composed of a hierarchy of logical storage objects: volumes are composed of subvolumes, subvolumes are composed of plexes, and plexes are composed of volume elements. Volume elements are composed of disk partitions. This hierarchy of storage units is shown in Figure 3-3, an example of a relatively complex logical volume.
Figure 3-3 illustrates the relationships between volumes, subvolumes, plexes, and volume elements in an XLV logical volume. In this example, six physical disk drives contain eight disk partitions. The logical volume has a log subvolume, a data subvolume, and a real-time subvolume. The log subvolume has two plexes (copies of the data) for higher reliability, and the data and real-time subvolumes are not plexed (meaning that they each consist of a single plex). The log plexes each consist of a volume element, which is a disk partition on disk 1. The plex of the data subvolume consists of two volume elements, a partition that is the remainder of disk 1 and a partition that is all of disk 2. The plex used for the real-time subvolume is striped for increased performance. The striped volume element is constructed from four disk partitions, each of which is an entire disk.
The following subsections describe these logical storage objects in more detail.
Volumes are composed of subvolumes. All volumes must have a data subvolume. Two other subvolumes, the log subvolume and the real-time subvolume, are optional. For XFS filesystems, a volume consists of a data subvolume, an optional log subvolume, and an optional real-time subvolume. For EFS filesystems, a volume consists of just one subvolume, the data subvolume. (EFS filesystems are of a filesystem type supported in previous IRIX releases; they are described in Appendix A, “EFS Filesystems”.) The breakdown of a volume into subvolumes is shown in Figure 3-4.
Each volume can be used as a single filesystem or as a raw partition. Volume information used by the system during system startup is stored in logical volume labels in the volume header of each disk used by the volume (see “Volume Headers” in Chapter 1). At system startup, volumes will not come online if any of their subvolumes cannot be brought online. You can create volumes, delete them, and move them to another system.
|Note: The plexing feature of XLV, which enables the use of the optional plexes, is available only when you purchase the Disk Plexing Option software option.|
The data subvolume is required in all XLV logical volumes. It is the only subvolume present in EFS filesystems. (EFS filesystems are of a filesystem type supported in previous IRIX releases; they are described in Appendix A, “EFS Filesystems”.)
The log subvolume contains XFS journaling information. It is a log of filesystem transactions and is used to expedite system recovery after a crash. Log information is sometimes put in the data subvolume rather than in a log subvolume (see “Choosing the Log Type and Size” in Chapter 6 and the mkfs_xfs(1M) reference page and its discussion of the -l option for more information).
Real-time subvolumes are generally used for data applications such as video, where guaranteed response time is more important than data integrity. Chapter 8, “System Administration for Guaranteed-Rate I/O”, explains how applications access data on real-time subvolumes.
Subvolumes enforce separation among data types. For example, user data cannot overwrite filesystem log data. Subvolumes also enable filesystem data and user data to be configured to meet goals for performance and reliability. For example, performance can be improved by putting subvolumes on different disk drives.
Each subvolume can be organized independently. For example, the log subvolume can be plexed for fault tolerance and the real-time subvolume can be striped across a large number of disks to give maximum throughput for video playback.
Volume elements that are part of a real-time subvolume should not be on the same disk as volume elements used for data or log subvolumes. This is a recommendation for all files on real-time subvolumes and required for files used for guaranteed-rate I/O with hard guarantees. (See “Hardware Configuration Requirements for GRIO” in Chapter 8 for more information.)
Once a subvolume is created, it cannot be detached from its volume or deleted without deleting its volume. Subvolumes are automatically deleted when their volumes are deleted.
A subvolume can contain from one to four plexes (also known as mirrors). Each plex is an exact replica of all or a portion of the subvolume's data. By creating a subvolume with multiple plexes, system reliability is increased because there are redundant copies of the data.
If there is just one plex in a subvolume, that plex spans the entire address space of the subvolume. However, in the case of multiple plexes, individual plexes can have holes in their address spaces as long as the union of all plexes spans the entire address space. Figure 3-6 shows an example of this configuration. The subvolume contains three plexes. If complete, each plex would contain three volume elements. However, two of the plexes are missing a volume element. This is allowed because there is at least one volume element with each address range. In fact, if Plex 1 in the figure were detached (removed from the subvolume), the subvolume would still be functional because there is still at least one volume element with each address range.
Data is written to all plexes. When an additional plex is added to a subvolume, the entire plex is copied (this is called a plex revive) automatically by the system. See the xlv_assemble(1M) and xlv_plexd(1M) reference pages for more information.
A plex is composed of one or more volume elements, as shown in Figure 3-7, up to a maximum of 128 volume elements. Each volume element represents a range of addresses within the subvolume.
When a plex is composed of two or more volume elements, it is said to have concatenated volume elements. With concatenation, data written sequentially to the plex is also written sequentially to the volume elements; the first volume element is filled, then the second, and so on. Concatenation is useful for creating a filesystem that is larger than the size of a single disk.
You can add plexes to subvolumes, detach them from subvolumes that have multiple plexes (and possibly attach them elsewhere), and delete them from subvolumes that have multiple plexes.
|Note: To have multiple plexes, you must purchase the Disk Plexing Option software option and obtain and install a FLEXlm license.|
Volume elements are the lowest level in the hierarchy of logical storage objects: volumes are composed of subvolumes; subvolumes are composed of plexes; and plexes are composed of volume elements. Volume elements are composed of physical storage elements: disk partitions. They are composed of one or more disk partitions with or without striping (at least two disk partitions are required for striping). Any mixture of the three types of volume elements (single partition, striped, and multipartition) can be included in a plex.
The simplest type of volume element is a single disk partition. The two other types of volume elements, striped volume elements and multipartition volume elements, are composed of several disk partitions. Figure 3-8 shows a single partition volume element.
Figure 3-9 shows a striped volume element. Striped volume elements consist of two or more disk partitions, organized so that an amount of data called the stripe unit is written to each disk partition before writing the next stripe unit-worth of data to the next partition.
Striping can be used to alternate sections of data among multiple disks. This provides a performance advantage by allowing parallel I/O activity. You can use these rules of thumb as a starting point for choosing a stripe unit size:
The stripe unit size should be a function of the I/O size of the application that uses the striped volume and the number of partitions in the stripe: the stripe unit size should be the application I/O size divided by the number of partitions. This keeps all disks busy all of the time, which is ideal.
The default stripe unit is the device track size, which is a good value to use, particularly when there are more reads than writes to the disk.
Stripe unit sizes of less than 64 KB are not recommended.
For best write performance, the stripe unit size should be several tracks. However, large stripe unit sizes require larger I/O buffer sizes, which can be a problem.
In choosing the optimal stripe unit size, balance the benefits of parallel I/O activity, the efficiency of I/O to a single disk drive (larger reads and writes have less overhead), and the limits on I/O buffer size.
Figure 3-10 shows a multipartition volume element in which the volume element is composed of more than one disk partition. In this configuration, the disk partitions are addressed sequentially.
Volumes appear as block and character devices in the /dev directory. The device names for XLV logical volumes are /dev/xlv/volume_name and /dev/rxlv/volume_name, where volume_name is a volume name specified when the volume is created using the xlv_make command. The volume name and plex, subvolume, and volume element names specified while using the xlv_make command cannot contain periods (.).
|Note: In IRIX 6.2 and IRIX 5.3 with XFS, XLV logical volume device files had the names /dev/dsk/xlv/volname and /dev/rdsk/xlv/volname.|
When a volume is created on one system and moved (by moving the disks) to another system, the new volume name is the same as the original volume name with the hostname of the original system prepended. For example, if a volume called xlv0 is moved from a system called engrlab1 to a system called engrlab2, the device name of the volume on the new system is /dev/xlv/engrlab1.xlv0 (the old system name engrlab1 has been prepended to the volume name xlv0).
XLV does not require an explicit configuration file, nor is it turned on and off with the chkconfig command. XLV is able to assemble logical volumes based solely upon information written in the logical volume labels. During initialization, the system performs a hardware inventory, reads all the logical volume labels, and automatically assembles the available disks into previously defined volumes.
If some disks are missing, XLV checks whether there are enough volume elements among the available plexes to map the entire address space. If the whole address space is available, XLV brings the volume online even if some of the plexes are incomplete.
For read failures on log and data subvolumes, XLV rereads from a different plex (when available) and attempts to fix the failed plex by rewriting the results. XLV does not retry on failures for real-time data.
For write errors on log and data subvolumes, XLV assumes that these write errors are hard errors (the disk driver and controllers handle soft errors). If the volume element with a hard error is plexed, XLV marks the volume element offline and ignores the volume element from then on. If the volume element is not plexed, the volume element remains associated with the volume and an error is returned.
XLV does not handle write errors on real-time subvolumes. Incorrect data is returned without error messages on subsequent reads.
Raw swap devices cannot be XLV logical volumes. (However, swap space can be added as a regular file in a filesystem and that filesystem could be on an XLV logical volume. See the chapter “Configuring Disk and Swap Space” in the IRIX Admin: System Configuration and Operation guide for more information.)
XLV logical volumes are not recommended on systems with a single disk.
Data subvolumes are required.
Log subvolumes are optional. If they are not used, log information is put into an internal log in the data subvolume. In most cases, there is no advantage to using an external log.
Real-time subvolumes are optional.
When you want a large raw partition with no filesystem on it, only the data subvolume is used.
When you create a logical volume with a real-time subvolume, it must also include a data subvolume.
Follow these basic guidelines for choosing which subvolumes to use with EFS filesystems. (EFS filesystems are of a filesystem type supported in previous IRIX releases; they are described in Appendix A, “EFS Filesystems”.)
Only data subvolumes can be used.
The maximum size of an EFS filesystem is 8 GB; do not make the data subvolume bigger than that or the space is wasted.
The maximum size of a subvolume is one terabyte on 32-bit systems (IP17, IP20, IP22, and IP32). It is unlimited on 64-bit systems (IP19, IP21, IP25, IP26, and IP27).
Choosing the size of the log (and therefore the size of the log subvolume) is discussed in “Choosing the Log Type and Size” in Chapter 6. Note that if you do not intend to repartition a disk to create an optimal-size log partition, your choice of an available disk partition may determine the size of the log.
Use plexing when high reliability and high availability of data are required.
The root filesystem can be plexed; each plex must be a single partition volume element.
Dual-hosted XLV logical volumes (logical volume on disks that are connected to two systems) cannot be plexed.
RAID disks should not be plexed.
Plexes can have “holes” in them, portions of the address range not contained by a volume element, as long as at least one of the plexes in the subvolume has a volume element with the address range of the hole.
The volume elements in each plex of a subvolume must be identical in size with their counterparts in other plexes (volume elements with the same address range). The structure within a volume element (single partition, striped, or multipartition) does not have to match the structure within its counterparts.
To make volume elements identical in size, use the fx command in expert mode
(fx -x). At the first fx menu, give the command repartition/expert -b. This enables you to repartition in units of blocks, which ensures that the volume element is the exact size you want it.
Striped I/O can be used with both direct and buffered I/O. Whether to stripe or not to stripe depends on the access patterns of the data. In general, striped performance is better than non-striped performance.
Striped disks lead to performance improvement only when the applications that use them make large data transfers that access all disks in the stripe in the filesystem.
Striped volume elements should be made of disk partitions that are exactly the same size. When the disk partitions are different sizes, the smallest size is used. Additional space in the larger partitions is wasted.
For best performance, each disk involved in a striped volume element should be on a separate controller. For some disk types, performance improvement is seen with up to four disks per controller. For other disk types, no additional performance improvement is seen with three or more disks.
A log subvolume can be striped only if it is an external log. Striping a log does not result in a performance improvement.
The root filesystem cannot have concatenated disk partitions.
It is better to concatenate single-partition volume elements into a plex rather than to create a single multipartition volume element. This is not for performance reasons, but for reliability. When one disk partition goes bad in a multipartition volume element, the whole volume element is taken offline.