The VPro architecture is a departure from the traditional SIMD large-chip-count graphics systems. VPro consists of only two main chips: Buzz,the transform and rasterizer chip, and PB&J, the back-end video chip. The chips run at high clock rates and in many ways are very similar to a RISC type architecture. The VPro architecture does not have the strip length, context switch, or inter-chip communication drawbacks of traditional SIMD graphics architectures. Buzz implements the full geometry pipeline, including transformation, lighting, and clipping, along with the full OpenGL 1.2 imaging pipeline. Lighting is fully hardware-accelerated for both per-vertex and per-pixel shading. Texturing features include both 2D and 3D textures, borders, post-texture lookup tables, and post-texture specular highlights. The full pipeline runs with 12-bit per-component or greater precision from the geometry stage through rasterization.
This chapter further describes the VPro architecture in two parts:
hardware features
rendering features
This section describes the following features:
Graphics memory architecture
Graphics memory usage
dual-channel display
command FIFO and context switching
On VPro, the commands FIFO, framebuffer, textures, pbuffers, scratch buffers, and all other buffers are allocated out of one large pool of memory. VPro is available in two memory sizes: a 32MB version called V6 and a 128 version called V8. Off-screen rendering (pbuffers) and accumulation buffer usage run at full hardware speeds. Copies between buffers and textures are extremely fast since they are performed on board without needing a read to the host and then back again to the board.
The VPro drawable buffers, having a maximum addressable area of 4K x 4K, allows very large pbuffer and imaging operations. The number of pbuffers is dependent on the size of the buffer (up to the maximum of 4K x 4K), the depth of the buffer, and the available free graphics memory. Any on-board memory that is not directly used for drawable buffers can be used for auxiliary buffers, pbuffers (fully hardware-accelerated), and textures.
Memory usage for on-screen buffers can be calculated by multiplying the screen size in pixels by either 10 or 18 bytes per pixel (actual framebuffer at either 8 or 16 bytes per pixel), plus another 12 bytes per pixel if H/W accelerated accumulation buffers are configured.
Using xsetmon, you can configure the graphics memory to allocate and query memory for optimal application configurations. You can evaluate tradeoffs such as pbuffers against textures, and the like.
The large addressable framebuffer also enables a high-resolution, dual-channel display option. The left and right channels, positioned side-by-side in a single logical screen, allow windows to be dragged between displays (or rendering into a single viewport that covers both displays). The optional hardware adaptor card provides two digital or analog outputs. The SGI 1600SW (with an adaptor) is compatible with this dual-channel option, allowing completely digital throughput to the monitor.
VPro's deep command FIFO allows applications to optimize load balancing between the host and graphics processors. The VPro architecture is also designed for very fast context switching. Commands are sent to a deep command FIFO, which contains approximately one millisecond of commands. The FIFO can hold multiple command streams; so, a context switch does not have to wait for the FIFO to drain. The trade-off for fast context switches is a maximum latency of the size of the FIFO. This latency is only apparent to operations that require a round trip to the graphics, such as glReadPixels.
State is also shadowed on the host. The state shadow has an additional benefit that redundant state changes are eliminated on the host and state queries are extremely fast.
Using xsetmon,you can configure the framebuffer to be either 16-byte or 8-byte . The memory is allocated at X startup time. The visuals that are advertised by the X server are dependent on the initial size of the framebuffer. Pbuffer visuals are available in all standard formats. Visuals that are too deep for an 8-byte framebuffer (for example. double-buffered RGBA + depth) will not be available if the X Server is started with a 8-byte framebuffer. Likewise, if the server is not started with a hardware accumulation buffer, all accumulation operations will be done in software.
8-Byte Visuals:
visual x bf lv rg d st r g b a ax dp st accum buffs ms id dep cl sp sz l ci b ro sz sz sz sz bf th cl r g b a ns b ----------------------------------------------------------------- 0x20 8 pc . 8 . c . . . . . . . 24 . . . . . . . 0x21 8 pc . 8 . c y . . . . . . 24 . . . . . . . 0x23 8 pc y 8 1 c . . . . . . . . . . . . . . . 0x24 8 pc . 8 1 c . . . . . . . . . . . . . . . 0x25 8 pc . 8 . r y . 8 . . . . 24 . . . . . . . 0x26 8 pc . 8 . r y . 8 . . 8 . 24 . . . . . . . 0x27 8 pc . 8 . r y . 8 . . 8 . 24 8 . . . . . . 0x28 8 pc . 8 . c y y . . . . . 24 . . . . . . . 0x29 8 pc . 8 . c y y . . . . . 24 8 . . . . . . 0x2a 8 pc . 8 . r y y 8 . . . . 24 . . . . . . . 0x2b 8 pc . 8 . r y y 8 . . 8 . . . . . . . . . 0x2c 12 pc . 12 . c . . . . . . . 24 . . . . . . . 0x2d 12 pc . 12 . c . . . . . . . 24 8 . . . . . . 0x2e 12 pc . 12 . c y . . . . . . 24 . . . . . . . 0x2f 12 pc . 12 . c y . . . . . . 24 8 . . . . . . 0x30 12 pc . 12 . c y y . . . . . . . . . . . . . 0x31 12 pc . 12 . r y . 12 . . . . 24 . . . . . . . 0x32 12 pc . 12 . r y y 12 . . . . . . . . . . . . 0x33 12 pc . 12 . r . . 12 . . 12 . 24 . . . . . . . 0x34 12 pc . 12 . r . . 12 . . 12 . 24 8 . . . . . . 0x35 12 pc . 12 . r y . 12 . . 12 . . . . . . . . . 0x36 12 tc . 16 . r y . 4 4 4 4 . 24 . 16 16 16 16 . . 0x37 12 tc . 16 . r y . 4 4 4 4 . 24 8 16 16 16 16 . . 0x38 12 tc . 16 . r y y 4 4 4 4 . . . 16 16 16 16 . . 0x39 15 tc . 16 . r y . 5 5 5 1 . 24 . 16 16 16 16 . . 0x3a 15 tc . 16 . r y . 5 5 5 1 . 24 8 16 16 16 16 . . 0x3b 15 tc . 16 . r y y 5 5 5 1 . . . 16 16 16 16 . . 0x3c 24 tc . 32 . r . . 8 8 8 8 . 24 . 16 16 16 16 . . 0x3d 24 tc . 32 . r . . 8 8 8 8 . 24 8 16 16 16 16 . . 0x3e 24 tc . 32 . r y . 8 8 8 8 . . . 16 16 16 16 . . ----------------------------------------------------------------- id dep cl xp bs lv rg d st rb gb bb ab ax dp st ar ag ab aa ms,b ----------------------------------------------------------------- 0x40 30 tc . 32 . r . . 10 10 10 2 . 24 . 16 16 16 16 . . 0x41 30 tc . 32 . r . . 10 10 10 2 . 24 8 16 16 16 16 . . 0x42 30 tc . 32 . r y . 10 10 10 2 . . . 16 16 16 16 . . 0x43 30 tc . 48 . r . . 12 12 12 12 . . . 16 16 16 16 . . 0x44 30 tc . 48 . r . . 12 12 12 12 . 16 . 16 16 16 16 . . |
[1]
16-Byte Visuals:
visual x bf lv rg d st r g b a ax dp st accum buffs ms id dep cl sp sz l ci b ro sz sz sz sz bf th cl r g b a ns b ----------------------------------------------------------------- 0x20 8 pc . 8 . c . . . . . . . 24 . . . . . . . 0x21 8 pc . 8 . c y . . . . . . 24 . . . . . . . 0x23 8 pc y 8 1 c . . . . . . . . . . . . . . . 0x24 8 pc . 8 1 c . . . . . . . . . . . . . . . 0x25 8 pc . 8 . r y . 8 . . . . 24 . . . . . . . 0x26 8 pc . 8 . r y . 8 . . 8 . 24 . . . . . . . 0x27 8 pc . 8 . r y . 8 . . 8 . 24 8 . . . . . . 0x28 8 pc . 8 . c y y . . . . . 24 . . . . . . . 0x29 8 pc . 8 . c y y . . . . . 24 8 . . . . . . 0x2a 8 pc . 8 . r y y 8 . . . . 24 . . . . . . . 0x2b 8 pc . 8 . r y y 8 . . 8 . 24 . . . . . . . 0x2c 12 pc . 12 . c . . . . . . . 24 . . . . . . . 0x2d 12 pc . 12 . c . . . . . . . 24 8 . . . . . . 0x2e 12 pc . 12 . c y . . . . . . 24 . . . . . . . 0x2f 12 pc . 12 . c y . . . . . . 24 8 . . . . . . 0x30 12 pc . 12 . c y y . . . . . 24 . . . . . . . 0x31 12 pc . 12 . c y y . . . . . 24 8 . . . . . . 0x32 12 pc . 12 . r y . 12 . . . . 24 . . . . . . . 0x33 12 pc . 12 . r y y 12 . . . . 24 . . . . . . . 0x34 12 pc . 12 . r . . 12 . . 12 . 24 . . . . . . . 0x35 12 pc . 12 . r . . 12 . . 12 . 24 8 . . . . . . 0x36 12 pc . 12 . r y . 12 . . 12 . 24 . . . . . . . 0x37 12 pc . 12 . r y . 12 . . 12 . 24 8 . . . . . . 0x38 12 tc . 16 . r y . 4 4 4 4 . 24 . 16 16 16 16 . . 0x39 12 tc . 16 . r y . 4 4 4 4 . 24 8 16 16 16 16 . . 0x3a 12 tc . 16 . r y y 4 4 4 4 . 24 . 16 16 16 16 . . 0x3b 12 tc . 16 . r y y 4 4 4 4 . 24 8 16 16 16 16 . . 0x3c 15 tc . 16 . r y . 5 5 5 1 . 24 . 16 16 16 16 . . 0x3d 15 tc . 16 . r y . 5 5 5 1 . 24 8 16 16 16 16 . . 0x3e 15 tc . 16 . r y y 5 5 5 1 . 24 . 16 16 16 16 . . ----------------------------------------------------------------- id dep cl xp bs lv rg d st rb gb bb ab ax dp st ar ag ab aa ms,b ----------------------------------------------------------------- 0x3f 15 tc . 16 . r y y 5 5 5 1 . 24 8 16 16 16 16 . . 0x40 24 tc . 32 . r . . 8 8 8 8 . 24 . 16 16 16 16 . . 0x41 24 tc . 32 . r . . 8 8 8 8 . 24 8 16 16 16 16 . . 0x42 24 tc . 32 . r y . 8 8 8 8 . 24 . 16 16 16 16 . . 0x43 24 tc . 32 . r y . 8 8 8 8 . 24 8 16 16 16 16 . . 0x45 30 tc . 32 . r . . 10 10 10 2 . 24 . 16 16 16 16 . . 0x46 30 tc . 32 . r . . 10 10 10 2 . 24 8 16 16 16 16 . . 0x47 30 tc . 32 . r y . 10 10 10 2 . 24 . 16 16 16 16 . . 0x48 30 tc . 32 . r y . 10 10 10 2 . 24 8 16 16 16 16 . . 0x49 30 tc . 48 . r . . 12 12 12 12 . . . 16 16 16 16 . . 0x4a 30 tc . 48 . r . . 12 12 12 12 . 16 . 16 16 16 16 . . 0x4b 30 tc . 48 . r y . 12 12 12 12 . . . 16 16 16 16 . . 0x4c 30 tc . 48 . r y . 12 12 12 12 . 16 . 16 16 16 16 . . |
[2]
This section describes the following topics:
Buffer management
Rendering techniques support
Geometry
Pixel operations
Imaging operations
All OpenGL buffers are allocated out of the same on-board memory pool. The number and size of buffers that are available is dependent on the amount of memory on-board (32MB or 128MB), the screen resolution, and the video modes. The maximum possible size for any drawable buffer is 4K x 4K.
VPro does not have a native GLfloat data format for pixel data. Float data that is read or written to the color buffer, depth buffer, stencil buffer, and accumulation buffer is converted on the host. For best performance, do not read or write pixel data as float but use any of the other GL data types, such as [unsigned] integer or [unsigned] long. Performance for pixel read and write operations is best for 32-bit RGBA formats.
VPro supports the following OpenGL buffers:
Color buffers
Stencil and depth buffers
Accumulation buffer
Overlay buffer
Off-screen rendering buffer (pbuffer)
VPro supports bitmap, CI, L, LA, RGB, RGBA, ABGR, BGRA, and YCrCB (through the subsample and YCrCb_format specification) in hardware. All color buffers are either single-, double-, or quad-buffered (stereo). The resolution and number of buffers is dependent on the amount of graphics memory. A 32MB system will not be able to do RGBA16 quad buffering.
The following formats are supported [3] :
Data Formats:
UNSIGNED_BYTE_3_3_2 UNSIGNED_BYTE_2_3_3_REV UNSIGNED_BYTE_5_6_5 UNSIGNED_BYTE_5_6_5_REV UNSIGNED_SHORT_4_4_4_4 UNSIGNED_SHORT_4_4_4_4_REV UNSIGNED_SHORT_5_5_5_1 UNSIGNED_SHORT_1_5_5_5_REV UNSIGNED_INT_8_8_8_8 UNSIGNED_INT_8_8_8_8_REV UNSIGNED_INT_10_10_10_2 UNSIGNED_INT_10_10_10_2_REV |
Framebuffer Formats:
R8 [4] , G8 4,B8 4, A8 4, L8[5] I8 LA8 R3_G3_B2 5 B2_G3_R3 5 R5_G6_B5 5 B5_G6_R5 5 RGB8 5 RGBA4 ABGR4 ARGB4 BGRA4 A1_RGB5 A1_BGR5 BGR5_A1 RGBA8 ABGR8 ARGB8 BGRA8 YCrCb_444(byte or ubyte)[6] YCrCb_422 (byte or ubyte) 6 R16 4, G16 4, B16 4, A16 4 R32 4, G32 4, B32 4, A32 4 L16 5 I16 LA16 RGB32 5 RGBA32 4 ABGR32 4 RGB16 5 RGB10_A2 A2_BGR10 BGR10_A2 A2_RGB10 RGBA12 ABGR12 RGBA16 ABGR16 YCrCb_422 (short or ushort)[7] L8, I8 CI8 L16, I16 CI12, CI16 L32 LA8 LA16 LA32 Bitmap CI8 CI16 CI32 |
The OpenGL accumulation buffer is fully hardware-accelerated if the X server is configured to advertise hardware-accelerated accumulations buffers. VPro also supports an accumulation buffer within a pbuffer. Using setmon or xsetmon, you need to do configuration before X server startup time for either software accumulation or shallow hardware accumulation. The software accumulation buffer supports 16-bit precision per component RGBA and the hardware accumulation buffer can support 24-bit precision per component RGBA.
There is one Z/stencil buffer_per_color_buffer_ for supported formats. The stencil and depth buffers are packed into one 4-byte value on reads and writes. Hence, requesting stencil does not add an additional memory hit over requesting only depth. However, since the stencil planes cannot be cleared using the hardware fast clear mode, it should not be requested as part of the visual unless it is going to be used. When using stencil and depth buffers, it is fastest to clear them both simultaneously.
The depth buffer is in eye space instead of the traditional screen space. Depth is calculated before the projection matrix, not afterwards. This allows greater precision in the depth buffer as the space is no longer non-linear due to the perspective divide. Depth buffer readback must be converted from eye space to screen space. Applications need to ensure that the depth buffer is cleared when changing the depth range and other depth buffer state operations.
Off-screen rendering areas are known as pbuffers. Pbuffers support all main buffer configurations including color (resolution and format), depth, stencil, and accumulation.
Pbuffers are allocated and destroyed through the fbconfig set of commands which become part of the GLX spec as of GLX 1.3. Based upon the video format loaded, buffer allocations are done statically when the X Server activates. Since pbuffers are allocated out of the on-board memory, pbuffer allocation can fail when graphics memory is depleted. This can happen if there are too many pbuffers, deep or wide main buffers (such as for dual-channel mode) or too many textures defined. VPro supports pbuffers as single-buffered entities only.
Moving data between buffers is very fast since the buffer-to-buffer copy happens entirely on-board. No round trip to the host is required. Moving data between any buffer (color, depth, stencil, and accumulation) and texture objects is also very fast. Copying data from graphics memory to the host will run at a slower rate. Buffer copies can be accomplished by using the MakeCurrentRead GLX extension.
To optimize performance, use the native VPro formats to read and write data. Use this guideline to ensure that the host does not have to convert the data. The host must convert the data when reading and writing in GLfloat format or reading in GLint format.
VPro has many advanced rendering features that provide support for high-quality imaging. There is support for the full OpenGL 1.2 pipeline in addition to many extensions. All paths through the pipeline are 12-bit per-component paths, including blending, texturing, and lighting. Per-pixel shading is supported as well as the Specular after Texture extension.
All OpenGL 1.2 blending modes are supported to take incoming RGBA fragments and blend them into the existing framebuffer. Table 2-1describes VPro blending.
Token | Attribute | Value |
|---|---|---|
Blend Op | source factor | ZERO |
|
| ONE |
|
| DST_COLOR |
|
| ONE_MINUS_DST_COLOR |
|
| SRC_ALPHA |
|
| ONE_MINUS_SRC_ALPHA |
|
| DST_ALPHA |
|
| ONE_MINUS_DST_ALPHA |
|
| SRC_ALPHA_SATURATE |
|
| CONSTANT_COLOR_EXT |
|
| ONE_MINUS_CONSTANT_COLOR_EXT |
|
| CONSTANT_ALPHA_EXT |
|
| ONE_MINUS_CONSTANT_ALPHA_EXT |
| destination factor | ZERO |
|
| ONE |
|
| SRC_COLOR |
|
| ONE_MINUS_SRC_COLOR |
|
| SRC_ALPHA |
|
| ONE_MINUS_SRC_ALPHA |
|
| DST_ALPHA |
|
| ONE_MINUS_DST_ALPHA |
|
| CONSTANT_COLOR_EXT |
|
| ONE_MINUS_CONSTANT_COLOR_EXT |
|
| CONSTANT_ALPHA_EXT |
|
| ONE_MINUS_CONSTANT_ALPHA_EXT |
| blend equation | FUNC_ADD_EXT |
|
| MIN_EXT |
|
| MAX_EXT |
|
| ALPHA_MIN_SGIX |
|
| ALPHA_MAX_SGIX |
|
| LOGIC_OP |
|
| FUNC_SUBTRACT_EXT |
|
| FUNC_REVERSE_SUBTRACT_EXT |
| constant red | [11:0] |
| constant green | [11:0] |
| constant blue | [11:0] |
| constant alpha | [11:0] |
Logic Op | logicop | CLEAR |
|
| AND |
|
| AND_REVERSE |
|
| COPY |
|
| AND_INVERTED |
|
| NOOP |
|
| XOR |
|
| OR |
|
| NOR |
|
| EQUIV |
|
| INVERT |
|
| OR_REVERSE |
|
| COPY_INVERTED |
|
| OR_INVERTED |
|
| NAND |
|
| SET |
Of interest is a new blend extension, SGIX_BLEND_ALPHA_MINMAX, which is similar to the EXT_BLEND_MINMAX extension but uses the minmax comparison result of only the alpha channel to choose all the color components.
Texture memory is shared with the on-board framebuffer SDRAM. Since the texture memory size is not constant it is important to use the texture proxy mechanisms to find the maximum texture size.
The following internal texture formats are supported:
ALPHA4
ALPHA8
ALPHA12
ALPHA16
LUMINANCE4
LUMINANCE8
LUMINANCE12
LUMINANCE16
LUMINANCE4_ALPHA4
LUMINANCE6_ALPHA2
LUMINANCE8_ALPHA8
LUMINANCE12_ALPHA4
LUMINANCE12_ALPHA12
LUMINANCE16_ALPHA16
INTENSITY4
INTENSITY8
INTENSITY12
INTENSITY16
R3_G3_B2
RGB4
RGB5
RGB8
RGB10
RGB12
RGB16
RGBA2
RGBA4
RGB5_A1
RGBA8
RGB10_A2
RGBA12
RGBA16
Table 2-2 summarizes the VPro texture features:
Feature | Specifications |
|---|---|
Texture types | 1D, 2D, 3D, including borders |
Filtering | point, bilinear, trilinear |
Texture lookup tables | 12-bit RGBA post-interpolation tables |
Max texture dimension (s or t) | 32K (up to board memory size) |
The following qualifications apply:
As textures and other buffers are all resident in the same memory space, the system performs a copy-by-reference when appropriate. Oversubscribing texture memory will cause textures to swap to the host. Copy-by-reference may not be implemented at launch.
The hardware supports texture borders for all dimensioned textures. Be aware that the default border color is (0,0,0,0), and the edge texels get interpolated with the border if wrap mode is not REPEAT. The CLAMP_TO_EDGE_SGIS can be used to ensure that the border texels are never accessed.
Post texture lighting is supported to allow lighting highlights to be applied after texturing. For more information, see the EXT_SEPARATE_SPECULAR_COLOR extension in “ New Extensions Supported by VPro ” in Chapter 3.
Trilinear filtering is fully hardware-accelerated.
All textures (including 3D textures) are perspective-correct.
Mipmapping is not supported with 3D textures. Applications which supply a mipmap pyramid and request a *_MIPMAP_* mode will automatically use NEAREST or LINEAR, regardless of what minification was requested. Any mipmap filtering will be ignored for 3D textures.
24-bit textures are not supported.
The multi-texture extension is not supported, though the same effect can be accomplished through the accumulation buffer, blending, or a number of other techniques.
Asynchronous texture download will exist through use of the SGIX_ASYNC extension. This feature will only be available on the 128MB Next Generation VPro.. Learn more about this feature in subsection “Pixel Operations” and section “ New Extensions Supported by VPro ” in Chapter 3.
Both Gouraud (per-vertex) and Phong shading (per-fragment) are supported in hardware. One single per-fragment light and eight per-vertex lights are supported in hardware. The per-fragment light and the per-vertex lights can be enabled simultaneously allowing a total of nine hardware-accelerated lights in a scene. A single light of each type is fastest. Each additional per-vertex light added will incrementally decrease performance. As with all of VPro, shading is done with 12-bit per-component resolution. Two-sided lighting, local lighting, local viewer, and the other standard OpenGL 1.2 lighting properties, are all implemented in hardware.
Use the rescale normal OpenGL 1.2 extension instead of NORMALIZE when introducing a scale into the model view matrix. Rescale normal is fully hardware-accelerated in the case where normalized normals are used with a scale in the model view matrix.
Shading is perspective-correct.
Point, line, and polygon AA is supported in hardware for both RGBA and CI visuals. Non-AA lines and points are supported in hardware up to a width of 10 with a very small incremental performance decrease for each additional width increment. AA points and lines are supported to a width of 2.0; AA widths above 2.0 will be clamped to 2.0. AA lines are hardware-accelerated, although the end caps will be French-cut, not AA. Full-scene AA will be supported using multi-pass to the hardware accumulation buffer.
Unlike previous graphics systems from SGI, VPro does not have a full geometry engine. Instead, there is a simple transform engine that runs at a very high clock rate. The transform engine performs well even for short strip lengths with even better rates for triangle strips of length 4 or greater.
Texturing and user clip planes both use a shared hardware texturing resource. VPro supports six user clip planes. There are three hardware clip planes that are implemented using the texturing hardware. The remaining three of the total six are implemented in software. Each texture dimension that is enabled moves a hardware-accelerated user clip plane to software. For example, there is only a single hardware-accelerated user clip plane when 2D texturing is enabled.
Fast Path
Points, lines, triangles in strips for display lists and vertex arrays Gouraud, Phong shading, depth, stencil, and accumulation buffer operations
Alpha Blending and alpha functions (fill hit with blending)
Enable texturing has little performance impact (textured versus non-textured fill)
The rescale normal extension is fully implemented in hardware. Use it where possible instead of NORMALIZE. (glnormalize versus rescale_normal with respect to geometry rates)
Short normals are accelerated.
2 lights, one Gouraud, and one Phong, 4 infinite lights, 2 double-sided lights
Geometry rates are bounded primarily by the host-to-graphics download rate, although textured geometry can become fill-limited. VPro is efficient with short strips ( 4 to 5 triangles), although long strips are better since they help to reduce the amount of data downloaded. There are three methods of sending data to the VPro board: immediate mode, display lists, and vertex arrays. All geometry is pushed to the graphics board from the CPU while textures and pixels are transferred using a DMA engine.
Immediate-mode rendering pushes each vertex and associated vertex data to the graphics pipe through a function call. This is the slowest method to send data to the pipe. Display lists can pack all data into one list and send it to the pipe more efficiently. Geometry calls within an VPro display list are optimized on the host for more efficient rendering. Adding mode changes or other state manipulation commands to a display list will not affect the performance of the entire display list. This is a change from previous hardware, where certain commands in the display list would be a performance penalty. Additionally, system memory is used to hold all display lists. This ensures that display list performance does not vary with the number or size of display lists.
Vertex arrays also offer an efficient way to send data to the graphics pipe. Packed vertex arrays are fully hardware-accelerated, though slightly slower than display lists. Optimizations are implemented for the commonly used packed vertex arrays and for some of the common separate arrays. The separate arrays will always be slower than the packed arrays due to cache effects. Likewise, the glDrawElements() and glArrayElement() calls will be slower than the corresponding non- gl*Element() calls.
The following array formats are explicitly tuned:
V3F
N3F_V3F
N3S_V3F
V3F_N3F
C3F_V3F
T2F_V3F
Much of the VPro state is shadowed on the host. The OpenGL libraries will be able to determine if a state change actually changes hardware state without a need to flush the pipe and query the hardware. The state shadowing allows glGet* operations that occur outside of a glBegin/End pair, the equivalent set operations, and glEnables and glDisables to be very efficient for applications that do a lot of state manipulation. Matrix manipulations are also shadowed.
Color, Normal, and TexCoord calls are not shadowed, nor are the push and pop attribute calls. The glPushAttrib() and glPopAttrib() calls will be costly compared to a single state change. If possible, applications should carefully monitor and control their own state instead of pushing and popping attributes.
This section describes the following topics:
buffer reads and writes
data conversions
non-blocking texture loads (and pixel reads and draws)
Buffer-to-buffer copies are fast as all buffers reside in on-board memory. This includes copies to and from pbuffers and textures. Pixel and texture data is transferred to the host by a DMA engine. There is hardware on-board to convert from an internal pixel format to an external pixel format; so, the format that is read or written is not important. When working with YCrCb and color conversions, the color matrix path will introduce a slight performance penalty.
Depth and stencil are stored in an internal floating-point format and packed together. Reading and writing depth and stencil buffers will be slower than the other buffers due to float conversions.
Pixel transfer speed will depend on the data types. The number of bytes transferred per pixel is shown in Table 2-3.
Table 2-3. Pixel Transfer Speed
Data Type | Number of Bytes Transferred |
|---|---|
CI8, L8 | 1Byte |
CI12, L16, LA8, RGBA4, RGB5_A1 | 2Bytes |
RGB8 | 3Bytes |
LA16, RGBA8, RGBA10_A2, Z24_S8 | 4Bytes |
RGBA12 | 6Bytes |
RGBA16 | 8Bytes |
Data type conversions are done to any float format and readback of integer formats. ARGB and BGRA short and integer are host-converted unless they are one of the packed formats (for example, 10/10/10/2).
VPro supports the GL_SGIX_async and GL_SGIX_async_pixel set of extensions, which allow non-blocking texture and pixel reads and writes. This means that other graphics operations can proceed immediately after a texture load request. You must properly synchronize commands to guarantee that texture data is resident in the graphics memory before rendering of data using that texture begins.
The VPro imaging pipeline is a full OpenGL 1.2 pipeline, including the ARB imaging package. It can be used for 2D image processing operations using glDrawPixels, glReadPixels, glCopyPixels, glTexImage, and gl GetTexImage. The VPro pixel path implements the full OpenGL 1.2 imaging pipeline.
7 x 7 convolutions will be done in hardware while larger convolutions require multi-pass rendering and will go slower. At some convolution sizes, the hardware speed is slower than a software convolve. The cutoff may be as low as 7 x 7 and is dependent on both kernel size and pixel depth.
[1] RGBA12 visuals with Z are only available on Next generation VPro.
[2] RGBA12 visuals with Z are only available on Next Generation VPro.
[3] _REV denotes the ordering of components will be reversed.
[4] Single components are still expanded to RGBA. The other components are set to 0 and alpha is always set to 0xff for packed and 0xfff for full mode.
[5] Alpha is set to 0xff for packed or 0xfff for full.
[6] For the YCrCb formats, VPro does the proper sampling; that is, replicate or zero-fill.
[7] For the YCrCb formats, VPro does the proper sampling; that is, replicate or zero-fill.