Chapter 3. Setting Up the BDSpro Service

Setting up BDSpro involves making minor modifications to your NFS implementation, verifying BDSpro performance, and correcting problems if they occur. This chapter contains the following sections that explain these tasks in detail:


Note: Be sure to review the information in Chapter 2, “Preparing Filesystems for BDSpro,”and follow the recommendations that it contains before proceeding with BDSpro setup.


Mounting Filesystems for BDSpro

To mount filesystems on BDSpro clients, use the standard NFS mount command and the -o bds option. BDS services are not invoked unless this option appears on the mount command line. Client options are set for the mount command in the /etc/fstab file. See the fstab(4) reference page for complete information on the mount command. The following example illustrates a BDS entry in a client /etc/fstab file:

erthmver-gsn0:/vol/raid24 /test nfs rw,bg,intr,vers=3,bds,bdsproto=stp,bdsauto=1m

This entry mounts erthmver-gsn0:/vol/raid24 on /test as a BDS file system. It specifies the Scheduled Transfer (ST) transmission protocol (bdsproto=stp) and will engage automatically for any transfer of greater than 1m (bdsauto=1m) or any direct I/O transfer.

The bdsproto entry can specify either TCP or ST transmission protocol. For ST protocol, you can also specify the size of the block and the size of the STU by using the bdsblksz and bdsstusz options, respectively. These options are used for performance tuning and bandwidth matching. See the bds(1M) reference page for a description of all BDS options.

Exporting Filesystems for BDSpro

To make filesystems available to BDSpro, export them from the server with the standard NFS exportfs command. No special arguments to exportfs are required, and all standard exportfs arguments are valid (see the exportfs(1M) reference page for details).


Note: Only XFS type filesystems are supported with BDSpro.


Automatic BDS Mounting

You can add the bdsauto option to a mount command line so that BDS is used automatically whenever the transfer size exceeds a specified limit. When bdsauto is set, standard NFS is used for file I/O unless the transfer size that you specify is exceeded, in which case BDS is used.

This example illustrates a mount command that sets bdsauto:

hip0-goliath:/ /bdsmnt -o bds, bdsauto=2000000, vers=3, rw 0 0

The entry in the previous example sets bdsauto to two megabytes, so BDS will be used in file access operations if the transfer size is two megabytes or larger, even if the application is not modified to take advantage of BDS. NFS will be used if the transfer size is smaller than two megabytes.


Note: When the transfer size is two megabytes or greater, TCP protocol is used with the bds option . Even if you specify the proto=udp option, the bds option will override it, using TCP. When the transfer size is less than two megabytes, UDP is used if proto=tcp is not specified.


Verifying BDSpro Performance

To test a BDSpro setup, run the BDSpro server in false disk mode. False disk mode is analogous to sending the data to /dev/null or receiving it from /dev/zero. In false disk mode, the BDSpro server simulates data movement to and from the disk; network data is unaffected. This mode is used to verify network performance.

Use this command to start the BDSpro server in false disk mode:

server# /usr/etc/bds -devnull -devzero -touch -log 

The -touch option tells BDSpro to touch all data before sending it, which simulates XFS overhead. (See the bds(1M) reference page for a description of all options).

On the client, first mount the server:

client# mkdir /mnt 
client# mount -o bds server:/ /mnt


Note: The mount will fail if you have not yet upgraded the client kernel to use BDSpro, but you can still test BDSpro with lmdd, since this command contains a user level implementation of the XBDS protocol (see “Debugging Without Kernel-Level BDSpro”).

After the filesystem is mounted, try reading a file using a command similar to this (remember that no data is actually read in false disk mode):

client# lmdd if=/mnt/unix direct=1 bs=1m move=20m 
20.97 MB in .43 secs, 48.31 MB/sec

If you enter two lmdd commands in succession, performance improves on the second read. This results from the overhead that BDS incurs on its first sever access; this overhead is not incurred in the second read:

client# lmdd if=/mnt/unix direct=1 bs=1m move=20m 
20.97 MB in .43 secs, 48.31 MB/sec
client# lmdd if=/mnt/unix direct=1 bs=1m move=20m 
20.97 MB in .29 secs, 73.34 MB/sec

Using the BDSpro Debugger

If BDSpro performance is not what you expected, you can use verbose debugging by adding the -debug option to the bds command line. You can debug in either false or real disk mode. When you use real data, debugging prints timing data for the network transfer and the filesystem transfer. In false disk mode, only network timing is displayed.

Use the following command on the client to generate debugging on the server:

client# lmdd of=debugfile bs=4m move=12m direct=1

The previous lmdd command writes data across the network to the BDS server using a block size of 4MB. The BDS transfer size is whatever the remote client is using for read or write requests.

The example that follows shows the bds debugging output on the server. Because data is read from the network and written to the disk, timing results on reads(readn) are network times and timing results on writes (write) are XFS times.

server# bds -debug
readn: 7fff2e18 72.0000  @ 3.0000 /sec
V3 filehandle: F xid=2878 uid=0 gid=0 fhandle len=68, buflen=16.0000E, oflags=9 
Want file handle 68 bytes
readn: 10619dd8 68.0000  @ 0.8314M/sec
setting bds_xfs_align to 4095
V3 filehandle: setting rbuflen to 7.9688M, wbuflen to 0
bdsid 7312, sprocid 7320
V3 filehandle write(4, 7fff2e18, 72) = 72
writen: V3 filehandle: 72.0000  @ 94.6328K/sec
V3 filehandle: A xid=2878 uid=0 gid=0 bytes=24.0000 
V3 filehandle write(4, 7fff2b50, 24) = 24
writen: V3 filehandle: 24.0000  @ 89.0000 /sec
readn: 7fff2e18 72.0000  @ 0.8917M/sec
V3 filehandle: W xid=2877 uid=0 gid=0 off=0x0 len=4.0000M
getbuf returning buffer 0
readn: 441c000 4.0000M @ 84.6113M/sec
write(direct): sz 4.0000M off 0 @ 82.1743M/sec
V3 filehandle write(4, 7fff2e18, 72) = 72
writen: V3 filehandle: 72.0000  @ 269.00 /sec
V3 filehandle: A xid=2877 uid=0 gid=0 bytes=4.0000M
freeing buffer 0
readn: 7fff2e18 72.0000  @ 0.8803M/sec
V3 filehandle: W xid=2879 uid=0 gid=0 off=0x400000 len=4.0000M
getbuf returning buffer 1
readn: 4018000 4.0000M @ 109.40M/sec
write(direct): sz 4.0000M off 4.0000M @ 103.86M/sec
V3 filehandle write(4, 7fff2e18, 72) = 72
writen: V3 filehandle: 72.0000  @ 90.9600K/sec
V3 filehandle: A xid=2879 uid=0 gid=0 bytes=4.0000M
freeing buffer 1
readn: 7fff2e18 72.0000  @ 0.9537M/sec
V3 filehandle: W xid=2880 uid=0 gid=0 off=0x800000 len=4.0000M
getbuf returning buffer 0
readn: 441c000 4.0000M @ 108.11M/sec
write(direct): sz 4.0000M off 8.0000M @ 105.79M/sec
V3 filehandle write(4, 7fff2e18, 72) = 72
writen: V3 filehandle: 72.0000  @ 88.1104K/sec
V3 filehandle: A xid=2880 uid=0 gid=0 bytes=4.0000M
freeing buffer 0
readn: 7fff2e18 72.0000  @ 0.8273M/sec
V3 filehandle: C xid=2881 uid=0 gid=0 
V3 filehandle write(4, 7fff2e18, 72) = 72
writen: V3 filehandle: 72.0000  @ 86.8047K/sec
V3 filehandle: A xid=2881 uid=0 gid=0 bytes=0
hip0-ebony.engr.sgi.com moved 12.58 MB in 1.58 secs, 7.96 MB/sec

Debugging Without Kernel-Level BDSpro

The lmdd debugging tool included with BDSpro has a user level implementation of the BDS protocol. lmdd tries to use the kernel level BDS protocol, but if that is not present (or not enabled), lmdd uses the user level protocol.

If you need to debug on the client system, you can do so by mounting the filesystem without the -o bds option. In this case, the filesystem is mounted with kernel level BDS disabled.

To see local debugging output, use lmdd with the debug=1 option, as shown in this example:

client# lmdd if=/bds/test direct=1 bs=4m move=12m debug=1
OS did not know O_DIRECT (8000) on /bds/test
0.003 bds_open(3, /bds/test, 32768, 3)
restart(/bds/test)
open(hip0-mahogany:/export/bds1/test) = 3
opened /bds/test
read at 0.000, rwnd=0 swnd=0
0.001: R xid=2 uid=0 gid=0 off=0x0 len=4.0000M
0.124: A xid=2 uid=0 gid=0 bytes=4.0000M
/bds/test moved=4194304 wanted=4194304 seekp=4194304 @ 22.8867M/sec
read at 0.176, rwnd=0 swnd=0
0.176: R xid=3 uid=0 gid=0 off=0x400000 len=4.0000M
0.300: A xid=3 uid=0 gid=0 bytes=4.0000M
/bds/test moved=4194304 wanted=4194304 seekp=8388608 @ 23.3921M/sec
read at 0.348, rwnd=0 swnd=0
0.348: R xid=4 uid=0 gid=0 off=0x800000 len=4.0000M
0.373: A xid=4 uid=0 gid=0 bytes=4.0000M
/bds/test moved=4194304 wanted=4194304 seekp=12582912 @ 21.4247M/sec
12.58 MB in 0.54 secs, 23.49 MB/sec

Correcting Network Problems

BDSpro is typically used on HIPPI because of its high performance (consult the IRIS HIPPI Administrator's Guide for information on installing and configuring a HIPPI network). If BDSpro testing shows inadequate performance, network problems might be the cause. In this case, you can use the ttcp test program (see the ttcp(1) reference page) to verify that the network is functioning properly. ttcp is a client/server program that moves data between systems and reports performance results.

To use ttcp, enter this command on the client to start the test:

client% ttcp -s -r -l 524288 
ttcp-r : buflen=524288, nbuf=2048,align=16384/0, port=5001 tcp 
ttcp-r : socket

Then, enter the following command on the server to start ttcp and send data to the client. The output of ttcp shows transfer rates:

server% ttcp -s -t -T -l 524288 -n 200 hip-client
ttcp-t: buflen=524288,nbuf=200, align=16384/0, port=5001 tcp -> hip-client ttcp-t: socket 
ttcp-t: connect 
ttcp-t: 104857600 bytes in 1.42 real seconds = 73843.38 KB/sec +++

The server should be sending data at approximately 65 to 70 MB per second. If you omit the -T option, which touches the data to measure caching effects, the rate should be approximately 90 MB per second. If the network is not performing at the expected level, determine the cause of the problem and correct it using these suggestions:

  • The data is not being transferred on the HIPPI network.

    By default, IRIX designates the Ethernet interface as the primary network interface and assigns the hostname (the name in the /etc/sys_id file) to this interface. To assign a hostname to the HIPPI interface, IRIX appends the prefix hippi to the hostname. (For example, if the Ethernet interface is named frosty, the HIPPI interface is named hippi-frosty).

    Check the hostname of the HIPPI client that you specified in the ttcp command to verify that it is the HIPPI hostname (and not the Ethernet hostname) for this client. Remember to specify a HIPPI interface when mounting the filesystem also (see “Mounting Filesystems for BDSpro”).

  • The server is running a debugging or sema metering kernel.

    Reboot with a non-debug kernel, which performs much faster.

  • The client or server is not running IRIX 6.5 or later.

    The IRIX version you are running does not offer the HIPPI performance that BDSpro release 2.3 requires. Upgrade to IRIX 6.5 or later.

  • The connection is passing through a router.

    Use netstat -i to determine whether the HIPPI interface on the server connects to the HIPPI network where the client resides. If the client and server are not on the same network, a high-performance router will be required to support HIPPI speeds.

If you try all of these measures and performance is still not adequate, contact your Silicon Graphics support provider for additional assistance.