Culling

icon
FSceneRenderer::ComputeViewVisibility

Do we need to render everything in the scene?

No, because:

  • Some objects are not inside the frustum, which can be addressed through primitive culling.
  • Some objects are occluded by others, which can be addressed through occlusion culling.
  • Some mesh draw commands of one object that passed the culling test may not be necessary to render in the current pass, which can be addressed through mesh draw command culling.

So let’s talk about the culling.

Actually, in unreal engine, Epic Games chose the word Relevance . But to keep the title easy to understand I still use ‘Culling'.

View

Before we talk about the culling, we need to think about the culling is based on what. If we have two camera views, we need to execute the culling for each camera view.

So the culling precedural is view-based. Let’s look at the FViewInfo

The scene renderer may need to render more than one view in one frame (for example, split screen), so it contains an array of FViewInfo.

Primitive Culling

In this step, it will do Frustum Culling and Distance-based culling in parallel.

Data Structure and Relationship

The result will be write into View.PrimitiveVisibilityMap

  • The PrimitiveVisibilityMap is a FSceneBitArray, you can treat this as an optimized version of a TArray<bool>.
  • The index of PrimitiveVisibilityMap is PrimitiveIndex , which saved in the FPrimitiveSceneInfo. If you forget what this is, check
Why parallel culling instead of tree-based culling? I think this is an old topic but please check this GDC talk Culling the Battlefield From the old tree-based culling to the new optimized parallel culling
image

Occlusion Culling

In my environment, Unreal Engine does not use Hierarchical-Z-Buffer (HZB) culling. Instead, it uses hardware queries for culling, which means it has to deal with latency issues.

The process works as follows:

  • First, send a bounding box to the GPU and request that it be rendered. Also request that the number of pixels rendered be returned and stored in a variable.
  • Later, check the variable and if it contains 0, it means the object was occluded in the last frame.
image
  • We show 2 frames of the occlusion query to help with understanding.
  • In the first frame, a query is added to the occlusion query history. This query is actually a render request for the bounding box.
  • image
  • The query data is then batched into BatchOcclusionQueries.
  • image
  • After that, it is sent to the GPU for execution, and the number of pixels that are actually rendered is counted.
  • image
  • The result is saved back.
  • In the next frame, the query result is picked up and the number of rendered pixels is checked.
image
  • If the number of rendered pixels is 0, then the primitive is totally occluded and bIsOccluded is set to true.
  • image

The final result is saved into PrimitiveVisibilityMap (if the bIsOccluded is true) or bOcclusionStateIsDefinite (if the bIsOccluded is false).

Latency

By the way, this latency may cause some problems. I'm not sure if this is the case, but it serves as a possible example.

Mesh Draw Command Culling (Relevance)

Actually this part is not about culling mesh draw commands. Since only visible primitives can reach here. So let’s change the word into Relevance.

Let's start by defining two concepts:

  • The core data structure in this context is FRelevancePacket. It contains the data for both input and output of relevance calculation.
  • For visibility checks and dynamic instance merging, Unreal Engine separates visibility out of FMeshDrawCommand, calling it FVisibleMeshDrawCommand.
    • The idea here is that the original mesh draw command is heavy and can be cached somewhere. The visibility information of the mesh draw command can be checked and passed around.
    • After the check is finished, Unreal Engine writes the necessary mesh draw commands back and uses them in the render pass functions.

The process can be described as follows:

  1. After primitive culling and occlusion culling, the necessary primitives are identified.
  2. Create FRelevancePacket and execute it in parallel.
    1. image
    2. If a primitive is relevant, add it to RelevantStaticPrimitives.
    3. image
    4. Pick a primitive from RelevantStaticPrimitives.
    5. image
    6. For each mesh pass:
      • If this primitive passes a set of flag checks and should be rendered in this pass, call AddCommandsForMesh.
        1. image
        2. This function takes the cached mesh command from StaticMeshCommandInfos. If necessary, this is where it comes from
      • Build a new FVisibleMeshDrawCommand, add a reference to the cached mesh draw command, and add it to VisibleCachedDrawCommands for later use.
  3. Wait for all parallel jobs to finish.
  4. Merge the results of each job into the mesh commands array, indexed by the mesh pass.
    • This image may help you better understand the meaning of merge
  5. Later, these commands will be used in the SetupMeshPass . And finally it will be used in render pass functions, which we talked in From Mesh Draw Commands to RHI Commands.

Now, let me show a detailed image of this process:

image

You may noticed, I put some numbers in this image, let me talk about them now.

  1. This is the function we talked in .
In GetViewRelevance , you can configure if this render proxy is bStaticRelevance or bDynamicRelevance

Some of them will be used later.

Additional Claim

In fact, after the culling process, Unreal Engine includes a dynamic instance merging system. However, I have chosen to skip this topic currently for two reasons: 1) It is not directly related to the culling process, and 2) We only have one cube in this case.