Chapter 14. Optimizing Rendering

One of the greatest challenges developers face is optimizing application performance. This chapter describes the Cosmo 3D nodes and programming techniques that can help optimize your application's performance.

The more vertices calculated and rendered, the slower the application's performance. If you can reduce calculations and rendering, you can improve application performance.

The following sections list means by which you can reduce the number of calculations made and the number of vertices rendered:


Note: For more information about performance tools, see the OpenGL Optimizer Programming Guide.


Face Culling

When solid, three-dimensional geometry is rendered, the side of it facing away from the camera is normally hidden by the side that faces the camera. For example, when a sphere is rendered, you normally only see its front side.

You can avoid rendering the back side of a geometry using the setCullFace() method, defined in csContext and csGeoSet as follows:

void setCullFace(csContext::CullFaceEnum cullFace);

The argument in setCullFace() specifies how much of a geometry is rendered. The possible argument values, enumerated in csContext::CullFaceEnum(), include

  • NO_CULL—Both front and back sides of geometries are rendered.

  • FRONT_CULL—Only the back sides of all geometries are rendered.

  • BACK_CULL—Only the front sides of all geometries are rendered.

  • BOTH_CULL—Geometries are not rendered.

getCullFace() returns one of these values, whichever is current.

Not rendering either the front or back side of a geometry improves rendering performance.

Back Patch Culling

Back patch culling, like back face culling, eliminates from the rendering process parts of a geometry. Whereas back face culling is based on triangles, back patch culling is based on primitives, such as a tristrip or trifan. If, for example, you want to cull back faces:

  • Back face culling prevents all triangles on the back side of a geometry from being rendered.

  • Back patch culling prevents all primitives wholly on the back of the geometry from being rendered.

    If any part of a primitive is on the front of the geometry, the entire primitive is rendered, regardless of what part is on the front or back of the geometry.

To optimize the performance of your application, you would commonly back patch cull and then back face cull your scene graph.

Figure 14-1 shows the same geometry before and after back patch culling.

Figure 14-1. Before and After Back Patch Culling

Figure 14-1 Before and After Back Patch Culling

Back Patch Culling Advantage

Back patch culling occurs before the primitives are processed by the graphics pipeline. As a result, back patch culling off-loads some of the graphics pipeline work to the CPU, yielding an added degree of parallelism between the two processors.

In back face culling, all of the triangles in a geometry are processed and then those on one side of a geometry are discarded. The amount of processing can be significant, including

  1. Sending vertex data across the bus.

  2. Transforming the vertices from object coordinates to clip coordinates.

  3. Clipping to the viewing frustum.

Not displaying back faces is only a small part of the face processing. Consequently, culling back faces may not significantly enhance application performance.

In back patch culling, all primitives seen only on one side of a geometry are culled before their triangles are processed by the graphics pipeline. Back patch culling often improves application performance by not performing unnecessary processing.

Back patch culling usually reduces the work done by the graphics hardware but it does increase the workload of the host. Performance is enhanced when the time spent performing back patch culling is roughly equal to the time spent processing culled triangles in the graphics hardware.

When to Use Back Patch Culling

Back patch culling is most effective when:

  • The primitives are composed of many elements.

  • Most primitives are not on the front and back sides of geometric shapes.

If the primitives are short, the processing time of the back patch culling might approximate the rendering time for no net gain in performance. If most of the primitives wrap around to both sides of a geometry, few primitives are back patch culled.

Method of Calculation

To determine which side of a geometry an element of a primitive is on, the angle between the viewing vector and the normal to each element of a primitive is calculated. If the viewing angle is less than 90 degrees, the element is on the front of the geometry, as shown in Figure 14-2.

Figure 14-2. Viewing Angle

Figure 14-2 Viewing Angle

Updating the View Vector

As long as you use csDrawAction::apply() to initiate the back patch culling, you never need to calculate the view vector.

If you need greater control over the Draw process and use csDrawAction::draw(), you need to update the view vector using csDrawAction::updateViewVector(). csDrawAction::getViewVector() returns the view vector.

Normals

Two types of normals can be used to calculate the viewing angle:

  • Face normal—the normal to the surface of an element.

  • Primitive normal—the normal to all of the vertices of the elements when the PER_VERTEX_NORMAL binding is used, or the normal to the csGeoSet when any other binding is used.

Figure 14-3 shows these two types of normals.

Figure 14-3. Face and Primitive Normals

Figure 14-3 Face and Primitive Normals

Face normals can be calculated given the vertices; primitive normals must be provided by the scene graph.

The direction of the face normal follows the right hand rule: your thumb points in the direction of the normal when the fingers on your right hand wrap in the direction of the ordered vertices, as shown in Figure 14-4.

Figure 14-4. Direction of Normals

Figure 14-4 Direction of Normals

Choosing the Type of Normal

The csGeoSet method, setBPCullVertNormModeEnable(), sets the type of normal used to calculate the viewing angle: if the argument to the method evaluates TRUE, primitive normals are used, if the argument evaluates FALSE, face normals are used.

There is often a performance hit associated with face normals because they are calculated whereas primitive normals are provided by the scene graph. This performance hit occurs only when the back patch culling data is dynamically rebuilt, as explained in “Building Back Patch Culling Data for a csGeoSet”. The data is not rebuilt unless the primitive's coordinates or normals are changed.

csGeoSet::getBPCullPrimNormModeEnable() returns TRUE or FALSE, depending on whether the mode is primitive normals or face normals, respectively.

Primitive normals are used by default with the following primitive types:

  • csLineSet

  • csLineStripSet

  • csPointSet

Face normals are used by default with the following primitive types:

  • csPolySet

  • csQuadSet

  • csTriFanSet

  • csTriSet

  • csTriStripSet

If normals are not provided for a csGeoSet and the primitive normal mode is enabled, the csGeoSet is never back patch culled.

Using Back Patch Culling

There are two steps you take to implement back patch culling:

  • Enable back patch culling.

  • Build back patch culling data.

Enabling Back Patch Culling

You enable back patch culling by setting csDrawAction::setBPCullMode() with one of the following arguments:

  • NONE—Disables back patch culling.

  • DRAW_FRONT_FACING—Culls all primitives wholly on the back side of a geometry.

  • DRAW_BACK_FACING—Culls all primitives wholly on the front side of a geometry.

csDrawAction::getBPCullMode() returns the back patch culling mode.

Once back patch culling is set, it is carried out whenever an apply() method is invoked with a csDrawAction.

Building Back Patch Culling Data for a csGeoSet

Before calling a csDrawAction, which triggers back patch culling, you must first build back patch culling data for the csGeoSets of the scene. You only need to call csGeoSet::buildBPCullData() once; afterwards, the data can be automatically recomputed.

If back patch data does not exist for a csGeoSet, nothing is culled by back patch culling. You can check to see if back patch data exists for a csGeoSet by using csGeoSet::existsBPCullData().

Back patch data is written to, and read from .csb files.

You can delete back patch data using csGeoSet::deleteBPCullData().

Building Back Patch Culling Data for a Scene Graph

csGeoSet::buildBPCullData() builds the back patch culling data for a csGeoSet. It is the job of the application, however, to recursively go down through the scene graph and build back patch culling data for all of the csGeoSet nodes in a scene graph.

Updating Back Patch Culling Data

As the coordinates or the normals of primitives are changed, whether or not a primitive should be culled might also change. Optimizer, by default, automatically updates back patch culling data and culls the primitives correctly.

You can, however, turn off this automatic updating by setting csGeoSet::setBPCullDynamicBuildMode() to FALSE. Setting the argument to TRUE enables the automatic updating of back patch culling data.

csGeoSet::getBPCullDynamicBuildMode() returns TRUE if back patch culling data is automatically recomputed when csGeoSets change their coordinates or normals.

Back Patch Culling Code

Now that you understand all the facets of back patch culling, Example 14-1 presents the series of calls your application must make to implement back patch culling.

Example 14-1. Implementing Back Face Culling


// Create a draw action.
csDrawAction *drawAction = new csDrawAction;
...

// Set the back patch culling mode.
drawAction->setBPCullMode(csDrawAction::DRAW_FRONT_FACING);

// Build scene graph.
csNode *scene = new csNode;
...

// Build back patch culling data for scene graph by using 
// csGeoSet::buildBPCullData() in the following developer-supplied 
// method.
buildBPCullSceneData(scene);

// Apply the draw action to the scene graph.
drawAction->apply(scene);

Culling the View Frustum

View frustum culling eliminates from the rendering list all of those shapes not in the viewing frustum.

View frustum culling works best if the objects in a csGroup node are close together, for example, all of the nodes representing a body are linearly hierarchical. When this is the case, the CULL process only needs to visit the top of the body subgraph. If the body nodes were distributed horizontally, the CULL process would have to visit at least some of the other body nodes.

View frustum culling also works best when the csShapes are small compared to the full database size.

Objects that are roughly the same length in each of the three dimensions cull better than long, thin objects. An object that spans the database, for example, a beam across the ceiling of the building, cannot be culled as easily as two halves of the beam. It may be useful to divide up objects that can be easily divided.

OpenGL Optimizer provides tools to group together in the scene graph nodes whose shapes close together in world space.

Level of Detail Reduced for Performance

The children of a level of detail (csLOD) node each encapsulate a shape at a different level of detail. The factor of resolution between children of a csLOD is often one quarter; so when a lower resolution child replaces the current csLOD child displayed, only one quarter of the current number of vertices need to be rendered. The maximum reduction of detail is when all of the vertices of the highest-resolution image are reduced to a single pixel.

The csLOD (level of detail) node is a subclass of csSwitch. csLOD switches between its children nodes based on the proximity of an object to the camera.The further a shape is from the viewer, the less resolution needed to display it. Cosmo switches between the children automatically, based on range, to display a shape at the correct level of detail.

csLOD allows you to reach a compromise between performance and the level of detail rendered. For high quality images, a shape close to the camera should be rendered in high detail. When a shape recedes from the camera, the same level of detail is not necessary. Reducing the level of image detail reduces the number of vertices required to render a shape, which results in improved performance.

OpenGL Optimizer can create the csLOD child nodes.

Choosing a Child Node Based on Range

The distance, called the range, that determines which child of the csLOD is displayed is defined as the distance between a camera and a shape's center. Each child node of a csLOD node is associated with a range of distance values. The range is computed during the traversal of the scene graph. You set the range value using csLOD methods:

void setCenter(const csVec3f& c);
void getCenter(csVec3f &c) const;

void setRange(int index, float nearDistance,float farDistance);
void setRangeNear(int child,float distance);
void setRangeFar(int child,float distance);

int getNumRanges() const {return numRanges; }
float getRangeNear(int child) const;
float getRangeFar(int child) const;

The setCenter() method specifies the center of the LOD. The center point aids in calculating the range between the camera and the shape.

The setRange() method specifies the ranges over which a child node of the csLOD node is selected for display. The number of ranges must correspond to the number of csLOD child nodes. If that is not the case:

  • If too few ranges are specified, the highest-order child nodes are ignored.

  • If too many ranges are specified, the extra ranges are ignored.

Instead of using setRange(), you can use setRangeNear() together with setRangeFar() to specify the range over which a child node of a csLOD node is selected for display.

The camera may disregard range values and

  • Display an already-fetched level of detail while a higher level of detail is downloaded from disk.

  • Adjust the level of detail displayed to maintain a constant frame rate; this is always the case if you leave the range() field empty.

  • Disregard the range values for any other implementation-dependent reason.


Tip: For best results, specify ranges only where necessary; give browsers as much freedom as possible to choose levels of detail based on performance.


Transitioning Between Levels of Detail

The transition() method specifies the range over which one csLOD child changes into the next, as shown in Figure 14-5.

Figure 14-5. csLOD Ranges

Figure 14-5 csLOD Ranges

Performance Programming Techniques

The following sections provide programming tips for improving the performance of your application:

Minimize Use of csAppearance Fields

Many of the fields in csContext set the appearance of a geometry. The fields in csAppearance match those in csContext. Setting a csAppearance field overrides the values of the same fields set in csContext. Overriding csContext values, however, reduces performance because the field has to be reevaluated every time the csAppearance object is applied.

To maximize performance:

  • Set csContext fields to values that satisfy a majority of the shapes in a scene.

  • Set the inherit field in csAppearance to inherit, not apply, those fields.

In this way, you set the minimum number of csAppearance fields.

Minimize Use of csAppearance Modes

Some of the fields set in csAppearance are much more graphics-intensive than others. In particular, the blending, texture, and lighting fields require larger amounts of CPU time. To improve performance, it is better not to turn on these modes if your application does not need them.

Indexing csGeoSet Attributes

You can specify the appearance of all the csGeoSet elements making up a geometry either individually or collectively. You have the option of specifying the attribute values sequentially, so that the first element is described by the first csAttribute values or you can use an index system.

Choosing to index the attribute values (or not) can dramatically affect application performance. The general rule to remember when indexing or not is to determine whether many elements share the same vertices, or not. If many elements share the same vertex, index the attribute values; if a vertex is not shared by many elements, specify the attribute values of the elements sequentially.

For more information about indexing, see “Indexing Attributes”.

Setting the Transformation Matrix Directly

Whether you set the transformation matrix explicitly or you use the csContext methods that set the transformation matrix for you, rendering performance is optimized for the following reason: a shape can be translated, scaled, and rotated. Rather than computing these methods every time a shape is drawn, the transformation matrix represents the product of all three methods. Likewise, when a transformation matrix node is the parent of many shape nodes, the transformation for all of the children shape nodes is captured in a single transformation matrix.

Compiling Part of a Scene Graph

Although there are no restrictions on the way in which you create a scene graph, it is customary to find that pieces deep in a scene graph branch add to pieces above them, which add to pieces above them until an entire object is described. For example, the lowest node in a branch might be a toe, the node above it might be the foot, the node above that the leg, the node above that the body; taken together, the nodes describe one side of the lower half of a body, as shown in Figure 14-6.

Figure 14-6. Arranging Scene Graph Nodes

Figure 14-6 Arranging Scene Graph Nodes

When an action traverses a scene graph, the more nodes it visits, the longer it takes to execute the action. If scene graph branches are deep, the traversal can become expensive. To correct this problem, if you find that the elements in a branch do not change often, you can precompile that branch using csCompileAction. This action compiles a specified subgraph into a data structure, which optimizes setting the traversal state.

To compile a subgraph, create a csCompileAction object and apply it to the root node of the scene graph, as follows

csCompileAction *compileIt = new csCompileAction;
compileIt->apply(node_name);

node_name is the name of the node below which you want to precompile.