This chapter explains what BDSpro requires to achieve its full potential and how to modify your current filesystems setup if you determine that it should be changed. The chapter contains these sections to help you prepare for running BDSpro:
BDSpro release 2.3 requires IRIX version 6.5 (or higher), BDSpro software, and NFS (version 2.0 or 3.0) installed on client and server systems. Server and client systems must also be at the current NFS and mount command overlays; server systems also require the current IRIX and networking overlays. Server and client systems must be connected to a HIPPI or other high-performance network. (See the BDSpro Release Notes for detailed requirements and instructions on software installation).
When selecting disks for use with BDS, choose a brand and model that excel in large sequential access operations. Disks with this characteristic offer better BDS performance and are therefore a better choice.
BDS installations that use fast-and-wide SCSI controllers with speeds of 20 MB per second (not UltraSCSI or Fiber Channel) must optimize the number of disks on a single SCSI bus. For example, a system containing four IBM drives with a transfer rate of 5 MB per second completely saturated a controller with a 20 MB per second transfer rate. The IBM drives provided by Silicon Graphics spin at about 7 MB per second on the outer zone and 5 MB per second on the inner zone. Assuming a transfer rate of 5 MB per second, it is apparent that 4 drives will completely saturate a single 20 MB per second controller.
Silicon Graphics provides SCSI boxes with an eight-drive capacity that are configured for one SCSI channel. This single channel configuration is inefficient for BDS— one channel can service only four of the disks before the maximum bandwidth is reached (4 disks @ 5 MB/disk/second = 20 MB/second). To optimize a factory-shipped SCSI box for BDSpro, reconfigure it from a single-channel to a two-channel device. A SCSI box with two channels offers twice the bandwidth and the additional channel can service the remaining four disks.
For maximum sequential performance with the minimum number of disks, purchase more controllers and use one controller for every three or four disks.
The limit that IRIX imposes on the maximum size of direct memory access (DMA) operations affects XFS, since direct I/O is a DMA operation. In IRIX 6.5, the default maximum DMA size is 4 MB. Frequently, this limit must be increased on BDSpro servers to achieve optimum performance.
To change the maximum DMA size, reset the maxdmasz variable using systune (see the systune(1M) reference page).
The values of maxdmasz are expressed in pages, which are 16 KB on 64-bit systems. Change these values to the size that you need, and then reconfigure and reboot the server.
BDSpro performance is highly dependent on the local performance of XFS and XLV on the server system; BDS is superfluous when local filesystem speed (or network speed) creates performance bottlenecks. You can often correct filesystem performance by properly configuring disks and by setting the correct size for direct memory access operations.
To measure local XFS performance, use lmdd commands similar to those shown below. If you determine that the results are inadequate for a BDS implementation, follow the tuning recommendations in “Tuning XLV Performance,” which follows.
This command creates testfile, a 500 MB file with a transfer size of 4MB; the return message confirms the new file:
server# lmdd of=/export/bds/local_xfs_file bs=4m move=500m direct=1 524.29 MB in 5.39 secs, 97.21 MB/sec |
This command performs a direct read on testfile with a transfer size of 4MB; the return message shows the XFS transfer rate:
server# lmdd if=/export/bds/local_xfs_file bs=4m move=500m direct=1 524.29 MB in 3.79 secs, 138.15 MB/sec |
XFS uses a logical volume manager, xlv, to stripe data across multiple disk drives. On striped disks, large XFS requests are split and sent to each disk in parallel. The high sequential performance of XFS is attributable to this parallelism. (See Chapter 7 of IRIX Admin: Disks and Filesystems for more information.)
The size of data transfers is an important consideration in planning logical volumes. For example, assume that a logical volume contains ten disks and the stripe size is 64 KB. In this case, transfers of 640 KB or larger are required to get all drives running simultaneously. If the data transfer size is 320 KB, only five drives are active in an I/O operation. Because only half of the available disks are used, a transfer size of 320 KB is very inefficient, reducing the total performance by half. With proper striping of logical volumes, however, you can maximize disk performance.
The xlv_make utility stripes the disks in a logical volume. By default, xlv_make divides the disk into tracks and uses one track from each disk in rotation to create a stripe. The amount of data that xlv_make allocates on a single drive before going to the next is called the stripe unit. The stripe unit and the number of disks in the logical volume determine the stripe width, or
stripe width = stripe unit × number of disks
Figure 2-1 illustrates a logical volume containing four disks. Notice from this figure that the stripe unit is set to two tracks instead of one (the default stripe unit size). If we assume a track size of 100 KB (track size is set by disk manufacturers), the stripe width for this logical volume is 800 KB.
When you create a logical volume, you can specify a stripe size using the stripe_unit argument of xlv_make (see the xlv_make(1M) reference page). Specifying the proper size of the stripe unit is the key to optimizing I/O performance. In most cases, the objective in setting the stripe size is to achieve a particular bandwidth; but you might also need to adjust the stripe size to accommodate an application that uses a fixed transfer size.
The transfer size should be a multiple of both the system's page size and the stripe width (800 KB in Figure 2-1). For example, if an application needs the bandwidth of all four disks but is reading with a transfer size of 400 KB, you could set the stripe unit to one track instead of two to achieve the required bandwidth with half the transfer size.
It is not always advisable to use the smallest possible stripe unit. While small requests can be effective with read transfers because of the read-ahead assistance that SCSI track buffers offer, small stripe units degrade write performance.
For example, consider what happens when data is written using the default stripe unit size of one track. The write is broken into tracks and each track is sent to a different disk. When the data arrives at the controller, the controller first waits for the disk head to move to the beginning of the track before it writes the data. This wait, commonly referred to as a rotational delay, occurs between each track that is written to the same disk; as a result, using a one-track stripe unit reduces the write performance to half of the read performance.
It is possible to achieve higher write performance by using larger stripe units. Table 2-1 shows the effects of increasing the size of stripe units on XFS write performance.
Table 2-1. Effects of Stripe Unit Size on XFS Write Performance
Stripe Unit | Request Size | Write Performance |
---|---|---|
1 track = 100 KB[a] | 1 track × 4 disks = 400 KB | 1/2 read performance |
2 tracks = 200 KB | 2 tracks × 4 disk = 800 KB | 2/3 read performance |
3 tracks = 300 KB | 3 tracks × 4 disks = 1.2 MB | 3/4 read performance |
4 tracks = 400 KB | 4 tracks × 4 disks = 1.6 MB | 4/5 read performance |
[a] Default size used by the xlv_make command. |
Table 2-2 shows the performance for BDSpro (version 2.0) using IBM drives with a 2 GB capacity and a HIPPI network. Three disks were configured on each controller; the transfer size was set to the stripe width size. Notice from Table 2-2 that BDS writes are significantly slower than XFS writes when write buffering is not used.
Table 2-2. Performance Results With Sample Configurations [a]
Disks | Unit | Width | XFS | XFS | BDS | BDS | BDS |
---|---|---|---|---|---|---|---|
7 | 256 KB | 1792 KB | 52 | 40 | 52 | 15 | 31 |
14 | 128 KB | 1792 KB | 79 | 43 | 69 | 28 | 42 |
14 | 256 KB | 3584 KB | 83 | 50 | 73 | 31 | 48 |
14 | 512 KB | 7168 KB | 84 | 53 | 74 | 33 | 53 |
36 | 256 KB | 9216 KB | 196 | 120 | 89 | 51 | 94 |
68 | 60 KB | 4080 KB | 189 | 121 | 81 | 51 | 92 |
68 | 120 KB | 8160 KB | 221 | 163 | 79 | 58 | 92 |