The OCTANE Digital Video VGI1 memory nodes are capable of full video-rate capture and playback to the Video Library buffers. This chapter explains how to optimize capture or playback to system memory or disk.
Data transfer between the VL and an application takes place through a DMbuffer or VL buffer. When the OCTANE Digital Video option transfers data from the application to the Video Library, the application retrieves an empty buffer using vlGetNextFree(). After placing data in the buffer, the application marks it as valid using the vlDMGetValid() or vlPutValid() routine. When the video device is finished reading from the buffer, it marks the buffer as free. For more details on the role of buffers in data transfer, see “Transferring Video Data to and From Devices” in Chapter 2.
int vlBufferAdvise(VLBuffer buffer, int advice)
specifies the ring buffer to be advised
specifies the type of advisory being made:
Marking the buffer non-cacheable indicates that the CPU cache does not have to be flushed or invalidated when data is read or written to system memory via DMA. However, any access to the buffer through the CPU must then bypass the cache and must always go to system memory. This arrangement can severely degrade the performance of an application that directly manipulates the video data.
Consequently, marking a buffer cacheable or noncacheable is application-dependent. In general:
If the application manipulates the data, even if it is only to copy the data into or out of another region of system memory, the buffer should be set cacheable. This setting is the default for a VL buffer.
If the application does not manipulate data, and all transfer is done strictly through DMA, then performance is optimized by setting the buffer to noncacheable. This is the case, for example, when video is read into a buffer and then written directly to disk with raw or direct I/O.
|Note: If raw or direct I/O is not used, the data is first copied into the filesystem cache. In that case, the buffer should be kept cacheable.|
The performance of the memcpy() and bcopy() routines is greatly affected by the alignment of the source and destination buffers. For copy operations between buffers with the same alignment, throughput is approximately 400% greater than between buffers with mismatched alignments. For memcpy() and bcopy(), the source and target buffers can be considered aligned if the following condition is met:
(src % 4) == (dest % 4)
In other words, the source and destination buffer addresses are equally distant from a word boundary.
Because the VL buffers used with the OCTANE Digital Video device are page-aligned, performance is maximized if the application's buffers are word-aligned. Note that the memory allocation routines such as malloc() return double-word (64-bit aligned) buffers. DMbuffers are guaranteed to be double-word aligned. All buffers received from the VL are guaranteed to be page-aligned, but not all DMbuffers are guaranteed to be page-aligned.
Capture or playback from a disk subsystem can be greatly improved by using direct I/O. Direct I/O bypasses the filesystem's buffer cache, eliminating a data copy and other overhead. The buffer can also be marked noncacheable, yielding further performance gains.
Because the filesystem cache is bypassed, device buffer alignment and block size restrictions fall onto the application. These restrictions can be obtained using
fcntl(int fd, F_DIOINFO, struct dioattr *dioattr)
The device can, for example, require that the buffer be page-aligned. Disk devices usually require that the buffer's size be a multiple of 512 bytes (the disk sector size), or a multiple of the stripe size.
In addition, device performance can be improved with certain alignments or sizes. For example, a device operating on a non-page-aligned VL buffer can internally break the request into a nonaligned part and an aligned part, yielding the overhead of two requests instead of one. In striped disk subsystems, performance is usually improved by reading or writing entire stripes at a time.
VL buffer elements used with the OCTANE Digital Video device are always page-aligned, which satisfies the alignment constraints of most devices. DMbuffer alignment, on the other hand, is a union of all requested alignments; see “Using Buffers” in Chapter 2.
The VL_MGV_BUFFER_QUANTUM control is provided so that an application can specify the block size that should be applied to a video unit. (The video unit is a field or frame, depending on the capture type.) For example, setting this control to 512 rounds the frame or field size, as reported by vlGetTransferSize(), up to a multiple of 512. This control should be set to a multiple of the block size returned by fcntl(fd, F_DIOINFO, ...), or to the optimal block size for the device.
When VL_MGV_BUFFER_QUANTUM is set to a value other than 1, the video data is padded at the end with random values. Consequently, it is important to use the same value for VL_MGV_BUFFER_QUANTUM on capture and on playback. Making the value the same can be a problem if a file is copied from one device to another with a different allowable block size. It is recommended that the control be set to a common multiple of the allowable sizes. For example, 4096 satisfies most devices. Otherwise, the file may need to be reformatted.
Some of the standard I/O routines support files sizes only up to 2 GB because file position is expressed as a signed integer. lseek, for example, only operates up to a 2 GB range. (Note that it is possible to use the read or write system calls to read or write past the 2 GB mark, up to the filesystem size).
int syssgi(int request, int fd, char *data, int blockoffset, int numblocks)
is SGI_READB for a read operation or SGI_WRITEB for a write operation
is a file descriptor of a character special device, as obtained by the open system call
points to the buffer to be written from or read to
is the block position where reading or writing should commence
is the number of blocks to read or write starting at blockoffset
Note that syssgi operates in units of device blocks as opposed to bytes. For disk subsystems, a block is usually 512 bytes, allowing 240 bytes of disk space to be addressed.
As with direct I/O, the application is responsible for ensuring that the data buffer is properly aligned and that block size constraints are followed.
Asynchronous I/O allows an application to process multiple read or write requests simultaneously. On Silicon Graphics platforms, asynchronous I/O is available through the aio facility. The aio64 facility additionally supports 64-bit file sizes and offsets.
Because multiple I/O requests might be outstanding when asynchronous I/O is used, the round-trip delay between making a request, having it serviced, and issuing another request is removed. Asynchronous I/O also eliminates any process-scheduling delay between these steps. In addition, the device being read from or written to might be able to optimize performance by carrying out the requests simultaneously.
For VL buffers only, keep the following points in mind when using asynchronous I/O:
The VL buffer is a first-in first-out mechanism. When putting a buffer element back into the buffer using vlPutValid(), the “oldest” element retrieved by vlGetNextFree() is used. There is no way to specify that a different element should be used.
Because asynchronous I/O operations can complete out of order, the application may need to keep a list of filled elements. When the oldest element is filled, the application can then call vlPutValid() to place it back into the buffer, and check to see if any other elements are also ready.
The same restriction applies to vlPutFree() for elements obtained with vlGetNextValid() or vlGetLatestValid().
|Caution: Software conversion can severely degrade capture or playback performance.|
The following examples of real-time capture and playback are available in /usr/share/src/dmedia/video/vl:
vidtodsk: video to disk using direct I/O (up to the disk subsystem rate)
dsktovid: disk to video using direct I/O (up to the disk subsystem rate)
vidtodsk_aio: video to disk using asynchronous and direct I/O (up to the disk subsystem rate)
dsktovid_aio: disk to video using asynchronous and direct I/O (up to the disk subsystem rate)