Chapter 2. Preparing Filesystems for BDSpro

This chapter explains what BDSpro requires to achieve its full potential and how to modify your current filesystems setup if you determine that it should be changed. The chapter contains these sections to help you prepare for running BDSpro:

Installation Prerequisites

BDSpro release 2.3 requires IRIX version 6.5 (or higher), BDSpro software, and NFS (version 2.0 or 3.0) installed on client and server systems. Server and client systems must also be at the current NFS and mount command overlays; server systems also require the current IRIX and networking overlays. Server and client systems must be connected to a HIPPI or other high-performance network. (See the BDSpro Release Notes for detailed requirements and instructions on software installation).

Disk and Controller Configuration Requirements

When selecting disks for use with BDS, choose a brand and model that excel in large sequential access operations. Disks with this characteristic offer better BDS performance and are therefore a better choice.

BDS installations that use fast-and-wide SCSI controllers with speeds of 20 MB per second (not UltraSCSI or Fiber Channel) must optimize the number of disks on a single SCSI bus. For example, a system containing four IBM drives with a transfer rate of 5 MB per second completely saturated a controller with a 20 MB per second transfer rate. The IBM drives provided by Silicon Graphics spin at about 7 MB per second on the outer zone and 5 MB per second on the inner zone. Assuming a transfer rate of 5 MB per second, it is apparent that 4 drives will completely saturate a single 20 MB per second controller.

Silicon Graphics provides SCSI boxes with an eight-drive capacity that are configured for one SCSI channel. This single channel configuration is inefficient for BDS— one channel can service only four of the disks before the maximum bandwidth is reached (4 disks @ 5 MB/disk/second = 20 MB/second). To optimize a factory-shipped SCSI box for BDSpro, reconfigure it from a single-channel to a two-channel device. A SCSI box with two channels offers twice the bandwidth and the additional channel can service the remaining four disks.

For maximum sequential performance with the minimum number of disks, purchase more controllers and use one controller for every three or four disks.

Changing the Maximum DMA Size

The limit that IRIX imposes on the maximum size of direct memory access (DMA) operations affects XFS, since direct I/O is a DMA operation. In IRIX 6.5, the default maximum DMA size is 4 MB. Frequently, this limit must be increased on BDSpro servers to achieve optimum performance.

To change the maximum DMA size, reset the maxdmasz variable using systune (see the systune(1M) reference page).

The values of maxdmasz are expressed in pages, which are 16 KB on 64-bit systems. Change these values to the size that you need, and then reconfigure and reboot the server.

Measuring XFS Rates

BDSpro performance is highly dependent on the local performance of XFS and XLV on the server system; BDS is superfluous when local filesystem speed (or network speed) creates performance bottlenecks. You can often correct filesystem performance by properly configuring disks and by setting the correct size for direct memory access operations.

To measure local XFS performance, use lmdd commands similar to those shown below. If you determine that the results are inadequate for a BDS implementation, follow the tuning recommendations in “Tuning XLV Performance,” which follows.

This command creates testfile, a 500 MB file with a transfer size of 4MB; the return message confirms the new file:

server# lmdd of=/export/bds/local_xfs_file bs=4m move=500m direct=1
524.29 MB in 5.39 secs, 97.21 MB/sec

This command performs a direct read on testfile with a transfer size of 4MB; the return message shows the XFS transfer rate:

server# lmdd if=/export/bds/local_xfs_file bs=4m move=500m direct=1
524.29 MB in 3.79 secs, 138.15 MB/sec

Tuning XLV Performance

XFS uses a logical volume manager, xlv, to stripe data across multiple disk drives. On striped disks, large XFS requests are split and sent to each disk in parallel. The high sequential performance of XFS is attributable to this parallelism. (See Chapter 7 of IRIX Admin: Disks and Filesystems for more information.)

The size of data transfers is an important consideration in planning logical volumes. For example, assume that a logical volume contains ten disks and the stripe size is 64 KB. In this case, transfers of 640 KB or larger are required to get all drives running simultaneously. If the data transfer size is 320 KB, only five drives are active in an I/O operation. Because only half of the available disks are used, a transfer size of 320 KB is very inefficient, reducing the total performance by half. With proper striping of logical volumes, however, you can maximize disk performance.

Disk Striping Fundamentals

The xlv_make utility stripes the disks in a logical volume. By default, xlv_make divides the disk into tracks and uses one track from each disk in rotation to create a stripe. The amount of data that xlv_make allocates on a single drive before going to the next is called the stripe unit. The stripe unit and the number of disks in the logical volume determine the stripe width, or

stripe width = stripe unit × number of disks

Figure 2-1 illustrates a logical volume containing four disks. Notice from this figure that the stripe unit is set to two tracks instead of one (the default stripe unit size). If we assume a track size of 100 KB (track size is set by disk manufacturers), the stripe width for this logical volume is 800 KB.

Figure 2-1. Effects of the Stripe Unit and Disk Number on Stripe Width

Figure 2-1 Effects of the Stripe Unit and Disk Number on Stripe Width

Determining the Size of Stripe Units

When you create a logical volume, you can specify a stripe size using the stripe_unit argument of xlv_make (see the xlv_make(1M) reference page). Specifying the proper size of the stripe unit is the key to optimizing I/O performance. In most cases, the objective in setting the stripe size is to achieve a particular bandwidth; but you might also need to adjust the stripe size to accommodate an application that uses a fixed transfer size.

The transfer size should be a multiple of both the system's page size and the stripe width (800 KB in Figure 2-1). For example, if an application needs the bandwidth of all four disks but is reading with a transfer size of 400 KB, you could set the stripe unit to one track instead of two to achieve the required bandwidth with half the transfer size.

Optimizing the Stripe Unit Size

It is not always advisable to use the smallest possible stripe unit. While small requests can be effective with read transfers because of the read-ahead assistance that SCSI track buffers offer, small stripe units degrade write performance.

For example, consider what happens when data is written using the default stripe unit size of one track. The write is broken into tracks and each track is sent to a different disk. When the data arrives at the controller, the controller first waits for the disk head to move to the beginning of the track before it writes the data. This wait, commonly referred to as a rotational delay, occurs between each track that is written to the same disk; as a result, using a one-track stripe unit reduces the write performance to half of the read performance.

It is possible to achieve higher write performance by using larger stripe units. Table 2-1 shows the effects of increasing the size of stripe units on XFS write performance.

Table 2-1. Effects of Stripe Unit Size on XFS Write Performance

Stripe Unit

Request Size

Write Performance

1 track = 100 KB[a]

1 track × 4 disks = 400 KB

1/2 read performance

2 tracks = 200 KB

2 tracks × 4 disk = 800 KB

2/3 read performance

3 tracks = 300 KB

3 tracks × 4 disks = 1.2 MB

3/4 read performance

4 tracks = 400 KB

4 tracks × 4 disks = 1.6 MB

4/5 read performance

[a] Default size used by the xlv_make command.


Sample Performance Results

Table 2-2 shows the performance for BDSpro (version 2.0) using IBM drives with a 2 GB capacity and a HIPPI network. Three disks were configured on each controller; the transfer size was set to the stripe width size. Notice from Table 2-2 that BDS writes are significantly slower than XFS writes when write buffering is not used.

Table 2-2. Performance Results With Sample Configurations [a]

Disks

Unit

Width

XFS
Read

XFS
Write

BDS
Read

BDS
Write

BDS
Buffered Write

7

256 KB

1792 KB

52

40

52

15

31

14

128 KB

1792 KB

79

43

69

28

42

14

256 KB

3584 KB

83

50

73

31

48

14

512 KB

7168 KB

84

53

74

33

53

36

256 KB

9216 KB

196

120

89

51

94

68

60 KB

4080 KB

189

121

81

51

92

68

120 KB

8160 KB

221

163

79

58

92