This section discusses the following:
This section discusses the following:
Do not attempt to make simultaneous configuration changes using the xvm(8) command-line interface (CLI) and the XVM Manager graphical user interface (GUI). Use one tool at a time.
The GUI provides a convenient display of XVM components. If you are using XVM in a cluster environment, you should use the XVM Manager or the CXFS Manager GUI to see your progress and to avoid adding or removing CXFS nodes too quickly. After defining a CXFS node, you should wait for it to appear in the view area before adding another node. After defining a cluster, you should wait for it to appear before you add nodes to it. If you make changes too quickly, errors can occur. For more information, see Chapter 10, “XVM Manager GUI”.
Use LUNs of equal size and one slice for each LUN. You should assemble stripes for any desired performance characteristics out of the slices or mirrors. If more size is needed, use a concat at the top level to allow an arbitrary number of volume elements to be connected.
For most configurations, the default data subvolume is the only subvolume required.
In order to create a name that will persist across reboots, SGI recommends that you explicitly name a volume when you create a volume element or an empty volume. This will reduce the risk of data loss.
| Note: If you do not name an empty volume when you create it, you must specify that the system generate a temporary name; this practice is not recommended for general configuration. |
If you have already created volumes that you did not name explicitly, you can use the change command to assign these volumes permanent names. See “Modifying Volume Elements with the change Command ” in Chapter 5.
If the LUNs you are striping are RAID devices, then it is also advantageous to have your XVM stripes line up on and be a multiple of the RAID stripe boundaries. This will not only allow all of your XVM LUNs to transfer in parallel, but all of the disks in the RAID will be accessed in parallel, in units of the same size. By default, a stripe unit must be a multiple of 32 512-byte blocks.
To get the best performance, make the XVM stripe unit be the same as the RAID stripe width, so that the RAID gets an optimally sized chunk of data to store.
Use the single-partition method for GTP label creation. For example: (line breaks added for readability):
# parted /dev/disk/by-path/pci-0000:03:00.0-fc-0x20360080e5232098:0x0000000000000000 "unit s mklabel gpt" # parted /dev/disk/by-path/pci-0000:03:00.0-fc-0x20360080e5232098:0x0000000000000000 \ "unit s mkpart primary xfs 34 -8193" |
For LUNs with a power-of-2 number of data drives, XVM will align by default. For non-power-of-2, use the following formula to calculate the alignment:
RAID_segment_size_in_KiB * number_of_data_drives * 2 |
This value will be in 512-byte basic blocks. For example, using a 6+1 RAID with a 64KiB-segment LUN:
64KiB * 6 * 2 = 768 blocks |
The slice command would be:
# xvm slice -all -align 786 phys/is5500_lun0 |
This section discusses the following:
To make the best use of space, create mirrors with components of identical size. If the components are not identical, there will be unused space in the larger components.
Place mirrors at the lowest possible level (below any stripes) to maximize independence and minimize synchronization times during revive operations. This provides the redundancy of a mirror with improved performance.
For large mirror components, the revive process may take a long time. Consider the following:
For a new mirror that does not need mirroring at creation, use the -clean option to specify that the mirror will revive at reboot but not creation. An example is creating a new filesystem; because everything will be written before it will be read, there is no need for a revive beforehand.
For new mirror that you will use for scratch filesystems (such as /tmp) that will never need to be synchronized, use the -norevive option to specify that the mirror will never revive.
You should set the xvm_max_revive_rsc and xvm_max_revive_threads XVM system-tunable kernel parameters appropriately for your site's mirror revive performance requirements. Increasing xvm_max_revive_rsc will increase the data throughput per thread, and increasing xvm_max_revive_threads will increase the number of parallel I/O processes used in reviving. Decreasing the resources causes less interference with an open filesystem at the cost of increasing the total time to revive the data.
As a general guideline:
Increase the revive resource tunable values if you want to revive as quickly as possible and do not mind the performance impact on normal I/O processes
Decrease the revive resource tunable values if you want to have a smaller impact on a particular filesystem
Sometimes it may be useful to categorize volume elements by name. For example, you may want to name a portion of a volume fast so that you can search for volumes that have fast stripe objects. For example:
xvm:cluster> stripe -volname myvol -vename fast0 -unit 128 slice/lucys2 slice/rickys0 slice/ethyls0 slice/freds0 </dev/cxvm/myvol> stripe/fast0 xvm:cluster> stripe -volname myvol -vename fast1 -unit 128 slice/lucys3 slice/rickys1 slice/ethyls1 slice/freds1 </dev/cxvm/myvol> stripe/fast1 |
When you name the stripes as in the preceding example, you can use a wildcard to show both fast0 and fast1 stripes:
xvm:cluster> show -topology stripe/fast*
stripe/fast0 23705088 online
slice/lucys2 5926340 online
slice/rickys0 5926340 online
slice/ethyls0 5926340 online
slice/freds0 5926340 online
stripe/fast1 23705088 online
slice/lucys3 5926340 online
slice/rickys1 5926340 online
slice/ethyls1 5926340 online
slice/freds1 5926340 online |
After you label a device, XVM will automatically probe it and any unlabeled disks in order to locate alternate paths. Disks are also probed when the system is booted and when you explicitly execute an XVM probe command. In most cases, this default behavior is appropriate.
However, a probe can be slow, and it is necessary to probe a newly-labeled device only once. For example, if you are executing a series of individual label commands, you might wish to disable automatic probing using one of the methods described in “Controlling Automatic Probing with the label and set Commands” in Chapter 5.
Due to some performance issues in XVM, SGI strongly recommends that the XVM slice be completely outside the RAID exclusion zone. By default, the xvm CLI label command places the user data area entirely outside of the exclusion zones, so you do not need to consider the exclusion zones in allocating slices. See “Making an XVM Volume Using a GPT Label” in Chapter 7.
Note: If it is necessary to allows the user space to overlap the
RAID exclusion zones, you can use the following command to override the
default layout behavior:
|
Different types of media are appropriate for different uses:
Solid-state drive (SSD) media is appropriate for small latency-sensitive operations
Rotating hard-disk drive (HDD) media is appropriate for larger bandwidth- and capacity-intensive operations
The ibound mount option specifies where the filesystem places the inodes, which lets you use SSD media for a filesystem's inodes and HDD media for the file data. In this case, you should create an XVM volume that concatenates a slice of SSD media with HDD media and then use the ibound mount option to restrict filesystem inode allocation to the fast SSD media at the beginning of the XVM volume.
| Note: The ibound mount option implies inode32 behavior and is therefore incompatible with the inode64 mount option. Behavior of the inode32 mount option is not affected. |
For more information, see the chapter about enhanced XFS extensions in XFS Administrator Guide.
This section discusses the following:
It is possible for XVM labels to become corrupted. Therefore, you should use the XVM dump command to make a copy of the configuration before making a change so that you can recover from potential problems introduced by the change. You should save the dump output into a filesystem other than the one being dumped.
Do not use a given disk device as an XVM volume if you are already using it to mount a filesystem outside of XVM. XVM cannot always detect that a LUN is already in use by some other subsystem, so verify that a LUN is available before creating XVM physvols on it.
This section discusses the following for RAIDs that do not use the asymmetrical logical unit access (ALUA) feature:
If you use non-ALUA RAID and if you spread I/O across controllers, you should define the /etc/failover2.conf file to ensure that I/O is done efficiently and is directed to the path that you prefer; unnecessary switching between RAID controllers can degrade performance considerably. See Chapter 6, “XVM Path Failover”.
In a cluster configuration, be sure that the /etc/failover2.conf file is correct and consistent on every node in the cluster.
For RAID that supports the ALUA feature, you should not use the affinity or preferred keywords for normal operation. Those keywords can be used in a failover2.conf file to override the settings read from the RAID in order to work around a problem.
You may find it useful to specify affinity values starting with affinity=1 and specify a nonzero value for all paths. This makes it easy to detect those paths that have not yet been configured because they are assigned the default of affinity=0. See “Set Appropriate affinity Values for Non-ALUA LUNs” in Chapter 6.
If you used the method recommended in “Set Nonzero affinity Values”, you should periodically examine the show -v output for new LUNs to find any that are unassigned. You may wish to write a script to perform this function and execute it via a cron(8) job.
If you change the affinity setting for a path in the cluster domain, you should include the -cluster option so that the setting is consistent across all nodes in the cluster. For example:
xvm:cluster> foswitch -cluster -setaffinity 2 -movepath phys/lun33 |
If you change the preferred path, you should include the -cluster option if the switch will move to a different affinity group. For example, suppose the following:
pathA affinity=1 preferred pathB affinity=1 pathC affinity=2 pathD affinity=2 |
You could switch the preferred path to pathB for a single node in the cluster because it has the same affinity setting as pathA:
xvm:cluster> foswitch -preferred pathA |
However, if you want to use pathC as the preferred path, you should include -cluster because it is part of a different affinity group:
xvm:cluster> foswitch -cluster -preferred pathC |
Normally, you should not use the /etc/failover2.conf file to override the path settings provided automatically by a RAID that has the ALUA feature. If changes are required, you should make them via the ALUA RAID software.
It is possible that XVM might not discover all devices associated with XVM volumes by the time that the filesystems listed in /etc/fstab are mounted, meaning that some volumes may not yet be complete at that point. If an fsck command is run on an XFS filesystem when XVM devices are undiscovered, the system may suspend the system boot sequence and require input from the administrator.
Therefore, for XFS filesystems listed in /etc/fstab that use XVM devices, you should set the fsck flag to 0. XVM includes a helper service that mounts all filesystems listed in /etc/fstab that use XVM devices at the time that XVM is started during the boot sequence.
You should only use the steal command when ownership cannot be changed by using the give command.
You should unmount a filesystem before making XVM topology changes.
A child of an open volume element can only be detached if this will not cause the volume element to go offline. The only child that can be detached without putting the volume element offline is a mirror leg that is not the last leg of that mirror.
In particular, normally you should not execute the following xvm commands on an open volume element:
change disable |
You should unmount a filesystem from the CXFS cluster and mount it locally before growing it. For more information, see the section about growing filesystems in the CXFS 7 Administrator Guide for SGI InfiniteStorage.
If you write scripts that use xvm(8) configuration commands, be aware that running multiple commands in quick sequence can cause the commands to fail. An XVM device newly created by one command is held open for an interval by Linux utility programs; subsequent xvm commands in the script cannot use the device and therefore fail. Following is common error in this situation:
error creating item: operation will cause the ve's subvolume to go offline |
For best performance in a production system, do not display of I/O activity in the output of the xvm show command (that is is, you should use the default setting of 0 for the pm_display_io_activity system tunable parameter, which disables this feature).
You should enable pm_display_io_activity only while you are performing configuration. See “pm_display_io_activity” in Appendix A
XVM uses symbolic links (in the format /dev/lxvm/localXFSname or /dev/cxvm/CXFSname) to the actual device names (in the format /dev/xvm- n). For example, the default ls(1) output will show the symbolic link:
# ls /dev/lxvm/align-tests0 /dev/lxvm/align-tests0 |
To view the link plus the actual device name, use the -l option:
# ls -l /dev/lxvm/align-tests0 lrwxrwxrwx 1 root root 8 Sep 30 13:41 /dev/lxvm/align-tests0 -> ../xvm-5 |
Commands such as mount(8) and df(1) show the actual device names. For example, to mount (using the symbolic link) and then show the mounted xfs filesystems:
# mount /dev/lxvm/align-tests0 mnt # mount -t xfs /dev/sda2 on / type xfs (rw) /dev/xvm-5 on /root/mnt type xfs (rw) |
To report on filesystem usage:
# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda2 20970644 6269112 14701532 30% / udev 4027496 1252 4026244 1% /dev tmpfs 4027496 0 4027496 0% /dev/shm /dev/xvm-5 35802032 32928 35769104 1% /root/mnt |
To get information by starting with a /dev/xvm- n device name, you can display information from the sysfs filesystem as follows:
# ls /dev/xvm-* /dev/xvm-10 /dev/xvm-5 /dev/xvm-6 /dev/xvm-8 # cat /sys/block/xvm-5/vol-info/xvm-domain lxvm # cat /sys/block/xvm-5/vol-info/xvm-volname align-tests0 # cat /sys/block/xvm-5/vol-info/xvm-svol-type data |