- Lumen Scene Update Trigger
- Surface Cache Texture
- GPU Passes
- Lumen Scene Update on the CPU side
- Card Relevence
- Card Placement
- Map Between Card and Page
- Data Compactness
- Card Capture Pass
- Copy
- Why we need an additional copy?
- Determination of Card Update List
Lumen Scene Update Trigger
Before analyzing the execution flow, we need to organize which situations may lead to the update of LumenScene.
- Movement of Primitive: When a Primitive moves into the current viewport from outside of the viewport or capture range, the update of LumenScene needs to be handled.
- Movement and orientation of the camera: When the area shot by the camera changes, LumenScene needs to be updated accordingly.
- Update of material itself: The material itself may have nodes that change over time. Lumen will also periodically capture the results of the changes. But it is not necessarily real-time capture.
This also includes real-time addition and deletion of Primitives.
This includes the addition, deletion, and adjustment of the resolution (MipMap level) of Cards.
All of these changes will be organized according to priority, and then a part of them will be updated each frame.
Surface Cache Texture
As the result of this update section, it will capture the necessary material information of the relevent cards to Albedo, Normal, Emissive, Depth, Stencil and other buffers. And copy them to the Surface Cache texture.
This gif shows how Albedo texture is rendered:
From here, we can see that LumenSurfaceCache also uses the VirtualTexture approach, where all LumenSurfaceCache are mapped to specific regions on the physical Texture through page tables.
Let's organize the data structure hierarchy here.
- Primitive: Mesh.
- Cards: Data structures used in Mesh for capturing and caching that are stored in advance. A Mesh can have multiple Cards.
- Page: The basic unit managed in physical cache Texture where Card is cached. A Card can cover multiple Pages. Similarly, when the resolution of the Card is too small, Lumen will merge multiple Cards into one Page.
Later, we will see how these data structures are gradually updated.
GPU Passes
Lumen Scene Update on the CPU side
The Lumen Scene update is initiated on the CPU side, and the most important part runs in the BeginUpdateLumenSceneTasks
function. The call stack is as follows:
- FDeferredShadingSceneRenderer::Render
- FDeferredShadingSceneRenderer::EndInitViews
- FDeferredShadingSceneRenderer::BeginUpdateLumenSceneTasks
For the processing flow of this section, you can refer to this conceptual diagram.
- Primitives: Delete or add a new primitive.
- MeshCards: Handle the add / delete of the mesh cards. For example, as your camera moves, a new Mesh enters the capture range.
- SurfaceCacheRequests:Organize SurfaceCache drawing requests based on the visibility and resolution requirements of existing MeshCards, as well as data from GPU readbacks.
- CardPagesToRender: Further analyze SurfaceCacheRequest and organize requests for drawing pages according to MipMap levels, etc.
- MeshDrawCommands: Here is the MeshDrawCommand layer that we are very familiar with when rendering the three-layer architecture. After this, it is consistent with the One Pass process we talked about before.
The following image shows a more detailed process:
Please pay attention to the following aspects:
- The update process of LumenScene starts from Primitive and refines to Cards. This part uses ParallelFor for parallelization.
- Differential updates are used for MeshCards.
- In addition to updates directly on the CPU based on distance and resolution, Mesh Card's Page will also be updated based on feedback data from the GPU.
- After this, SurfaceCacheRequest will be analyzed to sort out the CardPages that need to be rendered.
- Note that the number of updates will be constrained here to avoid significant fluctuations in frame rates.
Card Relevence
As mentioned in the "Importance" principle, we only need to capture the Mesh within a close range into the Card Cache. From this Gif, we can see that as the distance increases, the Card information gradually disappears.
The distance to MeshCard and the resolution of MeshCard in the current camera (screen space size) determine whether this MeshCard needs to be rendered, i.e., whether an FSurfaceCacheRequest needs to be generated.
Please note that the MeshCard of the floor in the above picture disappears slower than the Cube. This is because it covers more pixels and has a larger screen space size, so it needs to be excluded from a farther distance.
Card Placement
As we previously discussed, Lumen will compactly allocate the Cards' positions in the SurfaceCache physical texture.
Using our current scenario as an example:
- There is only one MeshCard that belongs to the floor, with a resolution of 128 x 128, therefore occupying a complete page.
- Next, it is the Cube's turn to be allocated. The Cube has 6 MeshCards, each requesting a resolution of 16 x 16.
- Since the floor has already used one page, the coordinates of the next page are (1, 0), which moves one physical address page to the right.
- According to the request of each Card, as the resolution is lower than the resolution of a single page (128), the allocation within the page begins.
- Each Card is allocated according to a right-to-left arrangement, with a total of 6 pages arranged towards the left, starting from the bottom right corner of the physical page.
Note that the value of X for Min decreases gradually.
The final result of the above content reflected in the rendering page layout is as follows:
Here we are only talking about the CPU-side work, and this Gif is only used to demonstrate the allocation process. The real rendering will be explained later.
Map Between Card and Page
So now we need to discuss a question:
If we want to sample information from a certain position in the Card, how do we calculate the actual coordinates in the AtlasPageTexture?
There are three data structures available to help you complete this task: Cards, PageBuffer, and PageTable.
These three Buffers provide two different ways to accomplish the mapping task:
- By using PageTable, which requires more computation to determine the coordinates in physical texture with smaller graphics memory bandwidth.
- By using PageBuffer, which requires more data to be loaded but involves less computation.
The flowchart for these two methods is as follows:
Data Compactness
FLumenCardData
requires 9 float4
s for compressed storage.
FLumenCardPageData
requires 5 float4
s for compressed storage.
An item in the PageTable only needs 2 uint32
s for storage.
Card Capture Pass
Card Capture Pass is similar to the familiar BasePass, except that this time it writes not a more complex GBuffer, but only four buffers: Albedo, Normal, Emissive, and DepthStencil.
The other three are relatively easy to understand, so we will focus on how Albedo Buffer is calculated.
Albedo Buffer still needs to calculate the nodes in the Material Graph. Therefore, it can be seen that the "CalcMaterialParametersEx" function is still called during the calculation process. This explains why Lumen can capture the surface color written in the Material Graph that changes over time.
However, after the calculation is completed, it will no longer perform complex calculations such as specular reflection (because it does not have the Camera information required for specular reflection). Instead, it assumes that the surface is a completely diffuse surface, calculates the final color, and then outputs it.
Copy
The final result will be copied to the SurfaceCache texture according to the type of Buffer.
Why we need an additional copy?
- For compression
- For material merging
Why not write directly to the Surface Cache during the Capture stage, but instead output to the Capture Atlas and then execute CopyToSurfaceCache? This is mainly because BC compression is used, and writing directly would increase compression instructions, leading to decreased write performance. Additionally, a separate Copy Pass can merge different material property data into multiple instances for drawing Quads all at once, resulting in better performance. Ref: https://zhuanlan.zhihu.com/p/516141543
Determination of Card Update List
Now it's time to start calculating the lighting of the Card. But before we start calculating the lighting for a specific Card, we must choose the small part of the Cards that needs to be updated in this frame.
We need a reasonable priority system that meets the following requirements:
- Able to perform calculations on a large number of CardPages in parallel
- Able to take into account priority factors such as distance from the camera
- Able to dynamically adjust priorities based on the most recent update time
Lumen's approach is to first calculate the priority of each Card, and then place it in a histogram based on the calculated priority. Finally, a portion of the Cards within the histogram that does not exceed the budget is selected, and these Cards are updated.
Some details:
- The priority levels of the histogram are a total of 128 levels. This number is controlled by the macro
PRIORITY_HISTOGRAM_SIZE
. - The histograms of Direct Lighting and Indirect Lighting are separate. Therefore, the same card will be calculated twice.