Data Structure on GPU

In the previous chapter, Vertex Shader of Depth Prepass, we discussed the transformation of vertex positions into final clip-space vertex positions via the vertex shader.

Two key components in this process are:

  • The LocalToWorld matrix, which is derived from the instance data buffer.
  • The WorldToClip matrix, which is accessed via the view uniform buffer.

In this chapter, we will examine both of these buffers in detail.

View Data

View data is easier, it stored in the uniform buffer.

image

These data is mapped from the FViewUniformShaderParameters in the C++ code

image

This macro is used in:

image

Then the FViewUniformShaderParameters is used in:

BEGIN_SHADER_PARAMETER_STRUCT(FViewShaderParameters, )
	SHADER_PARAMETER_STRUCT_REF(FViewUniformShaderParameters, View) //<-- Here!
	SHADER_PARAMETER_STRUCT_REF(FInstancedViewUniformShaderParameters, InstancedView)
END_SHADER_PARAMETER_STRUCT()

Finally, in the pass parameters:

BEGIN_SHADER_PARAMETER_STRUCT(FDepthPassParameters, )
	SHADER_PARAMETER_STRUCT_INCLUDE(FViewShaderParameters, View) //<-- Here!
	SHADER_PARAMETER_STRUCT_INCLUDE(FInstanceCullingDrawParams, InstanceCullingDrawParams)
	RENDER_TARGET_BINDING_SLOTS()
END_SHADER_PARAMETER_STRUCT()

When will this data get updated? Do you remember this line in our Exam : Try to draw a cube by yourself :

FRedCubePassParameters* PassParameters = GetRedCubePassParameters(GraphBuilder, View, RenderTargetTexture);

And this is the function:

FRedCubePassParameters* GetRedCubePassParameters(FRDGBuilder& GraphBuilder, const FViewInfo& View, FRDGTextureRef RenderTarget)
{
	auto* PassParameters = GraphBuilder.AllocParameters<FRedCubePassParameters>();
	PassParameters->View = View.GetShaderParameters(); //<-- Here!
	PassParameters->RenderTargets[0] = FRenderTargetBinding(RenderTarget, ERenderTargetLoadAction::ELoad);
	return PassParameters;
}

Instance Data

This is the complex part. To make it easier to understand, let's duplicate the cube. Having only one cube doesn't make much sense for instancing.

Now we have two cubes:

image

The transform matrics are:

image
image

These two cubes are rendered together by dynamic instancing, just like we said before:

image

The main steps to follow are:

  • Use SV_InstanceID to obtain the actual instanceId that can be used as an index to retrieve instance data from the instance data buffer.
  • Retrieve the FInstanceSceneData from the GPUScene.InstanceData (View_InstanceSceneData).
  • Obtain the PrimitiveId from the retrieved FInstanceScendData.
  • Use the PrimitiveId to fetch data from the GPUScene.PrimitiveSceneData (View_PrimitiveSceneData).

Instance Id

The input parameter provided by the GPU already contains the Instance Id. So why do we need another buffer? This is because Unreal Engine has a compute shader-based instance culling system that may remove some instances. Therefore, we need an indirect buffer to map the render instance id to the original pre-culling instance id. We then use the mapped id for data fetching.

For our cube example, it’s OK to ignore this.

The instance id buffer’s format is:

Each element is represented in 32 bits. The first 4 bits represent the view ID (which typically does not have many views), while the remaining bits represent the instance ID.

In my two cubes test, the instance id buffer is inverted, which means the first element has an instance id of 1 and the second element has an instance id of 0. I haven't investigated to figure out the reason.

Just a short summary, treat this buffer as a indirect mapping from SV_InstanceID to the InstanceId for fetching data from the instance data buffer.

Instance Scene Data

This data buffer contains packed instance data. But there is one thing I need to point out:

The instance scene data buffer is a buffer that uses the struct-of-array (SOA) style.

The final Instance data structure is

FInstanceSceneData

This image shows how some important members are loaded from the buffer.

image

And for some variant sized data, unreal engine put them into the payload buffer, and load them based on flags.

We apologize for skipping the payload data loading and decoding/calculation portions, but this image provides enough information for us to continue.

Primtive Scene Data

The primitive scene data structure is much complex.

FPrimitiveSceneData
  • The data is carefully encoded into the buffer. We need to pack on the cpu side (FPrimitiveSceneShaderData::Setup)and unpack on the gpu side (GetPrimitiveData(uint PrimitiveId)).

It is no meaning to talk about the details of the unpacking too much, so let me show a part of the decoded data:

Primitive.LocalToWorld :

100300010110001800001\begin{matrix}1 & 0 & 0 & 300 \\0 & 1 & 0 & 110 \\0 & 0 & 1 & 80 \\0 & 0 & 0 & 1 \end{matrix}

And this is what it looks like in the original raw data buffer:

image

This is the transform of our left cube.

ℹ️
In our example with two cubes, we don't actually need the primitive scene data. Instead, we can use the transform information directly from the instance scene data.

Difference between two cubes

In the instance scene data buffer:

Cube 0
Cube 1
SV_InstanceId
0
1
PrimitiveId
6
5
LocalToWorld
1 0 0 300 0 1 0 110 0 0 1 80
1 0 0 300 0 1 0 0 0 0 1 80

You may be wondering if these two instances can reference a single primitive instead of two.

I don't have the answer, but I suspect it's because they are dynamically instantiated.

Instanced static mesh component

I conducted another test in which, instead of using two static mesh components, I created a single instanced static mesh component and added two instances to it.

image

The actor is like this

image

The two instance data is:

image
Cube 0 Instance
Cube 1 Instance
SV_InstanceId
0
1
PrimitiveId
0
0
InstanceId
0
1
LocalToWorld
1 0 0 320 0 1 0 -100 0 0 1 80
1 0 0 320 0 1 0 100 0 0 1 80
image
image

Small things

Tile

It looks like unreal engine divides the instances into tiles. the tile size is 2097152 . The reason seems because of the LargeWorldCoordinates .