Chapter 1. BDS Fundamentals

Bulk Data Service (BDS) is a non-standard extension to NFS that handles file transactions over high-speed networks at accelerated rates. To accelerate standard NFS performance, BDS exploits the data access speed of the XFS filesystem and data transfer rates of high-speed networks, such as the high performance parallel interface (HIPPI) or the Gigabyte System Network (GSN). BDSpro is the Silicon Graphics implementation of the XBDS protocol.

This chapter contains the following sections to help you understand and evaluate BDS performance:

BDS Requirements

You can use BDSpro on Silicon Graphics systems running IRIX 6.5 (or later). The NFS product also must be installed on BDSpro hosts, and these hosts must be connected to a high-speed network (such as the GSN adaptor, Gigabit Ethernet, or HIPPI) running the transmission control protocol/internet protocol (TCP/IP) suite. In addition, the server must be running XFS. The current version of BDS does not support EFS file systems on the server.

For IRIX systems running release 6.5.6f onward (on both client and server machines), BDS also supports the Scheduled Transfer (ST) transmission protocol as an option along with the standard TCP/IP transmission protocol. For information on specifying ST, see “Using the Scheduled Transfer (ST) Protocol”.

Figure 1-1 illustrates the XBDS protocol relative to the Open Systems Interconnect (OSI) and Open Network Computing (ONC) protocols.

Figure 1-1. XBDS Protocol Compared With ONC Protocols

Figure 1-1 XBDS Protocol Compared With ONC Protocols

What BDS Offers

BDS is implemented as enhancements to NFS on the client system and a daemon process on the server. Figure 1-2 illustrates the BDSpro client-server model and the NFS client-server model on Silicon Graphics systems.

Figure 1-2. The BDSpro Client-Server Model

Figure 1-2 The BDSpro Client-Server Model

The hardware and software used on a network and its loading patterns determine the ultimate speed of NFS and BDS transactions. Because these factors vary greatly on individual networks, it is impossible to predict the performance gains that BDS will deliver to a particular network. However, to gauge BDS advantages over standard NFS, it is useful to compare BDSpro to NFS performance under ideal network conditions.

Table 1-1 compares BDSpro transfer speeds with NFS configurations.

Table 1-1. BDSpro Performance Compared With Standard NFS

Product

Network Configuration

Read Rate

NFS (version 2)

UDP over HIPPI

2.5 MB per second per channel

NFS (version 3)

TCP/IP over HIPPI

19 MB per second per channel

BDSpro

TCP/IP over HIPPI

88 MB per second per channel


How BDS Works

To achieve high throughput, BDS relies on the ability of the operating system to perform direct input and output (see the O_DIRECT option on the open(2) IRIX reference page for details). With direct I/O, the operating system reads and writes data from disk directly to a user buffer, bypassing an intermediate copy to the kernel buffer cache that is standard for other types of I/O.

In a network transaction such as an NFS read or write, the time saved by bypassing the kernel buffer cache is doubled, since the bypass occurs on both the client and the server systems. Figure 1-3 and Figure 1-4 and the discussions that follow them illustrate this difference in detail.

Standard NFS Transactions

Figure 1-3 illustrates the sequence of events in a standard NFS transaction.

Figure 1-3. Events in a Standard NFS Transaction

Figure 1-3 Events in a Standard NFS Transaction

These events take place in Figure 1-3:

  1. The application issues a read for remote data.

  2. The search for the data in the local buffer cache fails, triggering an NFS read.

  3. An NFS read is sent to the remote server.

  4. The search of the buffer cache on the remote server fails.

  5. The server reads from the filesystem on disk.

  6. Data is moved to the server's buffer cache.

  7. The buffer data is sent to the network.

  8. The client receives the data in its buffer cache.

  9. The data is sent from the buffer cache to the application.

BDS Transactions

Figure 1-4 illustrates the sequence of events in an BDS transaction.

Figure 1-4. Events in a BDSpro Transaction Without Buffering

Figure 1-4 Events in a BDSpro Transaction Without Buffering

These events take place in Figure 1-4:

  1. The application issues a read for remote data.

  2. A BDS read is sent to the remote BDS server.

  3. The BDS server reads directly from the filesystem on disk.

  4. The BDS server writes the data to the network.

  5. Data is mapped to the user's address space (page-flipped) directly from the network.

BDS Buffer Management

To increase throughput, BDS performs read buffering automatically. It performs write buffering if explicitly directed to do so (see “Using Write Buffering”).

The gains derived from buffering are a function of the speed of the network and filesystem in a particular configuration. However, in most BDS implementations, buffering improves performance significantly. For example, in laboratory tests, BDSpro achieved a 40 MB per second throughput rate without buffering. This rate increased to 87 MB per second with read buffering, and with write buffering, performance increased to 93 MB per second (the maximum bandwidth of the HIPPI connection).

When a connection requires buffering, BDS allocates two memory buffers for each open file. The size of these buffers is either calculated by BDS or specified by the user (for details, see “How Buffer Size Is Determined” and “Specifying a Buffer Size”).

BDS performs no buffering under these conditions:

  • When it cannot determine a buffer size (for writes only)

  • If read requests are not sequential

  • If multiple clients are accessing a file from the same host

  • When the buffer size is set to zero (see “Disabling Buffering”)

How Buffer Size Is Determined

When data is located in a real-time filesystem, BDSpro sets the buffer size to the extent size. When data is located on an XLV logical volume, BDSpro calculates the size of the disk stripe and sets the buffer size to the disk stripe size. This is the most efficient buffer size, because it optimizes XLV access and minimizes disk contentions (see “Tuning XLV Performance” in Chapter 2). When the disk is not striped, BDSpro uses the application's I/O request size to set the buffer size.

The calculated buffer size is the default, but you can override this default by specifying a buffer size by several methods (see “Specifying a Buffer Size” for details). The buffer size setting that is in effect applies to both read and write buffering (if write buffering is enabled).

Read Buffering in Detail

BDSpro performs read-ahead buffering; that is, as it sends data over the network to fill a read request, it simultaneously fills a second buffer with data from disk in anticipation of the next request. This concurrent disk and network I/O enhances BDSpro performance significantly.

Figure 1-5 illustrates read-ahead buffering in a BDS transaction.

Figure 1-5. Read Buffering in a BDSpro Transaction

Figure 1-5 Read Buffering in a BDSpro Transaction

As Figure 1-5 illustrates, BDS begins network transfers of read data from a full buffer—no data is transferred to the network until this buffer is full. While the contents of the first buffer is being transferred to the network, the second buffer is being filled from disk in preparation for subsequent read requests.

Write Buffering in Detail

BDS normally writes data to disk as the data is received from the network, as shown in Figure 1-6. But if you prefer, you can specify write buffering. BDS performs write-behind buffering; that is, it fills a buffer with data before writing any of the data to disk. Because a delay occurs between receiving and writing the data, write buffering poses some risks, so it should be used judiciously (see “Using Write Buffering” for details).

Figure 1-6 illustrates how write-behind buffering works.

Figure 1-6. Write Buffering in a BDSpro Transaction

Figure 1-6 Write Buffering in a BDSpro Transaction

Write Buffering Risks

Consider these risks before using write buffering:

  • A server failure can result in the loss of data if the failure occurs before the buffer is written to disk.

  • Write buffering delays error reporting, since an error is reported only after a complete buffer is transferred to disk.

  • Errors might be reported to a different application accessing the same file on the same client.

When BDS Makes Sense

While BDS offers clear advantages to standard NFS implementations in some operating environments, Silicon Graphics recommends it over standard NFS only in certain circumstances. Real network throughput rates, the applications running on a network, and the size of files involved in network operations determine whether BDS is a desirable addition to your current NFS implementation.

For transactions in which the read or write request size is 128 KB or larger, BDS is a sound alternative to standard NFS because it offers much faster performance (see Table 1-1 for speeds achieved with BDSpro). Furthermore, performance improves as the request size increases—BDS achieves optimum performance when the read and write request size is the same as the BDS buffer size (see “Specifying a Buffer Size” for details).

You should consider adding BDS if your NFS implementation meets these criteria:

  • Your applications use large read and write request sizes (greater than 128 KB) in requests to remote filesystems.

  • Network hardware is a high-speed medium (such as HIPPI or GSN) with a potential transfer rate of 100 MB per second or higher.

  • The applications that you use do not rely on data caching.