Posted on

Occlusion culling

Occlusion culling complements frustum culling by culling occluded objects. Frustum culling is an optimisation technique that discards meshes that sit outside of the viewing volume by testing each mesh against the six frustum planes. The culling can be accelerated with hierarchical spatial partitioning whereby the scene is carved up into a tree with each tree node representing a smaller region of space. A node contains all render-able objects enclosed in the node’s space allowing fast inclusion / rejection of the node objects. If, a node is found to intersect the viewing volume then the algorithm recurses down into the node’s child nodes etc.

That’s great however more can still be done.

Occlusion query to the rescue, kind of

The problem with frustum culling is that although we end up restricting the set of render-able objects to the set of objects inside the viewing frustum, some of those objects are occluding others and needn’t be drawn. But what about the z-buffer? Isn’t that supposed to prevent overdraw? Yes, if you depth sort your visible objects front to back but whilst the occluded objects won’t make it to the final frame they will still have their pixel and fragment shaders executed and this is where occlusion culling steps in.

Occlusion culling complements hierarchical frustum culling by extensions that allow you to query whether the object is occluded or not prior to rendering. As before we use hierarchical frustum culling to get our list of potentially visible objects (depth sorted). Next, we issue an occlusion query for each potentially visible mesh. We do this by disabling color writes, issuing an occlusion query and then drawing a simplified version of the object. In the next frame, we then test to see if the previous frame’s query result is available. If it is we read it. The result of the read can either be whether any samples passed (i.e. any pixels were drawn to the screen) or the exact number of samples drawn. If the object was visible in the previous frame we enable rendering to screen (optionally enabling depth buffer writes depending on the opacity of the object) and draw the mesh. If the object was not visible in the previous frame then we disable both color and depth buffer writes drawing a simplified version of the mesh as before etc.

The algorithm is described here.

Below is a clip of the technique in action in the engine. I set up a simple scene that renders spheres stacked behind one another. The number of visible objects is output to the console window.

Performance characteristics

The clip is kind of interesting because of the performance characteristics. The nearest occluding sphere is close to the viewpoint so is covering a lot of pixels. The pixel shader is Phong so whilst not expensive in terms of calculations it’s not trivial. When I pan left, right, more spheres are visible so the visible count increases, however, so does the frame rate? So I then zoom out so the only visible object covers less pixels (so less work is being performed by the pixel shader) and, as expected, the frame rate is higher the fewer visible objects there are.

This demonstrates an important aspect of this technique. That its use perhaps isn’t necessarily something you want to apply to all scene objects but instead to employ on an object by object and shader by shader basis. I.e. the benefit of the optimisation depends on the complexity of the model, the amount of pixels it covers from frame to frame and the runtime cost of the vertex and pixel shaders applied to the model. In other words, if we have an object that has lots of vertices and an expensive vertex shader, or results in a lot of pixels being drawn with an expensive pixel shader then occlusion culling could help improve the frame rate. However if the simplified object that is rendered during the occlusion test results in more pixels being rendered than the (unsimplified) mesh itself then occlusion query could actually slow down rendering. In this demo I don’t bother with the simplified object and instead render the full mesh during the occlusion test.

Perhaps another complementary strategy to employ might be to have shaders support the dialing down of detail based on the distance from the viewpoint (i.e Phong degrades to Gourad for instance) to reduce the cost of applying a shader for distant objects some more.