Chapter 7. I/O Tuning

This chapter contains the following topics:

About I/O Tuning

This chapter describes tuning information that you can use to improve I/O throughput and latency.

Application Placement and I/O Resources

It is useful to place an application on the same node as its I/O resource. For graphics applications, for example, this can improve performance up to 30 percent.

For example, assume an SGI UV system with the following devices:

# gfxtopology

Serial number: UV-00000021
Partition number: 0
8 Blades
248 CPUs
283.70 Gb Memory Total
5 I/O Risers

Blade Location    NASID  PCI Address    X Server Display   Device
----------------------------------------------------------------------
    0 r001i01b08      0  0000:05:00.0                  -   Matrox Pilot
    4 r001i01b12      8  0001:02:01.0                  -   SGI Scalable Graphics Capture
    6 r001i01b14     12  0003:07:00.0          Layout0.0   nVidia Quadro FX 5800
                         0003:08:00.0          Layout0.1   nVidia Quadro FX 5800
    7 r001i01b15     14  0004:03:00.0          Layout0.2   nVidia Quadro FX 5800

To run an OpenGL graphics program, such as glxgears(1), on the third graphics processing unit using numactl(8), type the following command:
% numactl -N 14 -m 14 /usr/bin/glxgears -display :0.2

This example assumes the X server was started with :0 == Layout0.

The -N parameter specifies to run the command on node 14. The -m parameter specifies to allocate memory only from node 14.

You could also use the dplace(1) command to place the application.

For information about the dplace(1) command, see the following:

“dplace Command” in Chapter 4

Layout of Filesystems and XVM for Multiple RAIDs

There can be latency spikes in response from a RAID, and such a spikes can in effect slow down all of the RAIDs as one I/O request completion waits for all of the striped pieces to complete.

The latency spikes' impact on throughput can be to stall all the I/O requests or to delay a few I/O requests while others continue. It depends on how the I/O request is striped across the devices. If the volumes are constructed as stripes to span all devices and the I/O requests are sized to be full stripes, the I/O requests stall because every I/O request has to touch every device. If the I/O requests can be completed by touching a subset of the devices, then those that do not touch a high-latency device can continue at full speed, while the stalled I/O requests can complete and catch up later.

In large storage configurations, it is possible to lay out the volumes to maximize the opportunity for the I/O requests to proceed in parallel, masking most of the effect of a few instances of high latency.

There are at least three classes of events that cause high latency I/O operations. These are as follows:

  1. Transient disk delays - one disk pauses

  2. Slow disks

  3. Transient RAID controller delays

The first two events affect a single logical unit number (LUN). The third event affects all the LUNs on a controller. The first and third events appear to happen at random. The second event is repeatable.