Radiosity
Data Flow
Parallelization
Tile to Thread
Trace to Thread
Trace
Global Distance Field Trace
Encode
Trace
Mesh Card Search
Spatial Filter Probes
Spherical Harmonic Lighting
Sampling Jitter

Radiosity

I am not a professional researcher in computer graphics, and my understanding of Radiosity method is very limited. Therefore, I will provide my personal understanding here. I highly recommend readers to refer to more professional computer graphics courses for descriptions. For example:

groups.csail.mit.edu

In order to calculate the reflection of light rays in a scene, in principle we need to calculate the impact of every incident ray on every point. However, this is not computationally feasible. Therefore, Radiosity uses the idea of finite elements to discretize the scene, as shown in the following figure:

From Microsoft PowerPoint - lecture23.ppt (mit.edu) Thanks Cong Yue for finding this great image

Next, calculating the light transmission between each Tile will turn the problem from completely infeasible to feasible.

Fortunately, the SurfaceCache has already simplified the scene into Cards, and we can reuse the Tile partitioning scheme used in Direct Lighting Injection to further divide Cards into Tiles.

Even so, if multiple bounces of light need to be calculated, the computational cost still increases exponentially. To address this, Lumen distributes the calculation of multiple bounces over multiple frames: if we reuse the previous SurfaceCache, we effectively calculate one additional bounce.

From Radiosity Progress - Radiosity (computer graphics) - Wikipedia

Data Flow

This image shows the process of calculating indirect lighting in a very concise way.

Reuse the logic of the Card, Card Page, and Card Tile sections and discretize the Card into Tiles.
Sampling every Texel of the Tile is too expensive, so divide the 8x8 pixel Tile into 2x2 and use a total of 4 Probes for sampling. Samples are taken on a Probe-by-Probe basis.
Starting from the Probe, sample the GlobalSDF to obtain information about the intersection position.
Apply spatial filtering to the Probe data.
Convert the Probe's information to spherical harmonics and integrate it.
Combine the results of direct lighting.

Parallelization

Tile to Thread

One of the major advantages of the GPU Driven system is its ability to greatly utilize the highly parallel nature of GPUs. However, this also brings about considerable difficulty in understanding. This section takes Tile partitioning as an example to elaborate on how to parallelize the partitioning from Card to Tile in detail.

In our example, we have a total of 7 Cards (6 Cards from each side of the Cube and 1 from the floor), mapped onto 7 CardPages. Out of these, 6 CardPages have only 16x16 texels while the floor is assigned 128x128 texels.

Lumen launched a total of 28 Thread Groups, with every 4 Thread Groups handling 1 Card Page. Each Thread Group contains 8x8 threads. Therefore, it can process up to (2x8) x (2x8) = 16x16 Tiles. Each Tile contains 8x8 pixels.

For smaller Pages, some threads will skip their own processing and not generate Tiles. All Tiles will be stored linearly, and inter-thread synchronization write is implemented through an InterlockedAdd.

Therefore, in the end, we generated 280 tiles.

(128 / 8)^{2} + (16 / 8)^{2} * 6= 280

Trace to Thread

Trace is also highly parallelized. The basic parallel rule is that each Tile produces 4 probes, and each probe traces 4x4 for a total of 16 times. Therefore, a total of 17,920 threads are required to complete the parallelization.

Trace

I would like to briefly introduce how Trace works to you through this diagram:

Overall, Trace consists of two steps:

Tracing the Global Distance Field to obtain intersection points.
Based on the intersection point information, finding the corresponding Mesh Card and sampling lighting information from the Card.

To accelerate the query, an Object Grid is used as an acceleration structure.

Global Distance Field Trace

Encode

Center And Extent of clip map volumes

UV map data for different clip maps

Visualization of 4-level clip maps

Global Distance Field is encoded and stored in the form of 4-layer Clip Maps.

Similarly, the main data also adopts the idea of virtual textures, using MipTexture, PageTableTexture, and AtlasTexture to reduce VRAM consumption for storage.

The following figure shows MipTextures with different Depth levels.

This image shows a slice of the AtlasTexture. Please note that the AtlasTexture is also a 3D Texture.

Trace

ℹ️

\Engine\Shaders\Private\DistanceField\GlobalDistanceFieldUtils.ush: FGlobalSDFTraceResult RayTraceGlobalDistanceField(FGlobalSDFTraceInput TraceInput)

Although the Trace contains many details, the basic implementation is consistent with the figure, which can be regarded as:

First, intersect with the ClipMap's Volume using a ray. If successful, perform a detailed step-by-step Trace.
Loop forward, up to 256 steps:

Step forward by SampleRayTime distance each time, then obtain the world space coordinates of the point and sample the Global Distance Field to determine if it hits the surface. If it hits, terminate.
Otherwise, since the current point's SDF represents the distance to the nearest surface, stepping forward by the same distance will not pass through any surface. Therefore, update the value of SampleRayTime to max(DistanceField * TraceInput.StepFactor, LocalMinStepSize) and loop again.

Mesh Card Search

ℹ️

\Engine\Shaders\Private\Lumen\LumenTracingCommon.ush : void EvaluateGlobalDistanceFieldHit(FConeTraceInput TraceInput, FGlobalSDFTraceResult SDFTraceResult, inout FConeTraceResult ConeTraceResult)

The Global SDF does not include surface materials or lighting information, so a way needs to be found to obtain this information. Obviously, this information is included in the previously rendered Mesh Card Page. However, traversing all the Mesh Cards based on intersection points to look for this information is too inefficient.

Therefore, the introduction of GlobalDistanceFieldPageObjectGridBuffer divides the associated Object into the Cell of the Grid in advance. When it needs to be queried, it only needs to find the Cell where the Trace intersection is located, and then traverse the MeshCard associated with the Cell to achieve the sampling of SurfaceCache.

The process of querying the corresponding MeshCard based on the position of the sampling point is shown in the figure:

First, calculate the corresponding UV coordinates in the ClipMapVolume based on the position of the sampling point.
Calculate the corresponding Page based on the UV coordinates.
Each Page is subdivided into a 4x4x4 Grid to further reduce the length of the object list to be traversed.
Each Grid Cell can hold up to 4 Objects.
Since these 4 Objects are potentially sampleable objects, traverse the Object List of this Cell and sample each MeshCard. If the sampling weight is not zero, the MeshCard is considered a valid MeshCard, and the hit point is the MeshCard.

ℹ️

If you want to know more about Object Grid, check: \Engine\Shaders\Private\DistanceField\GlobalDistanceFieldObjectGrid.ush

Spatial Filter Probes

ℹ️

Engine\Shaders\Private\Lumen\Radiosity\LumenRadiosity.usf : LumenRadiositySpatialFilterProbeRadiance

Afterward, Lumen will perform spatial filtering on the probes, reducing noise and stabilizing the final result by using information from neighboring probes. Lumen assumes that the changes in probes are low frequency, which is typically true in most cases.

In order to avoid mixing unnecessary probes that cause light leakage, Lumen calculates planar weights to exclude probes that are not on the same plane.

‣

If you want to know more about the plane weight calculation

Spherical Harmonic Lighting

ℹ️

\Engine\Shaders\Private\Lumen\Radiosity\LumenRadiosity.usf

Let's first organize the input and output of this step:

We have obtained the irradiance information around the current position using Probe. If you prefer a more rigorous term, it is called irradiance.
However, we ultimately want to record the output to the Indirect Lighting Texture, not "how much light is shining on the current texel", but rather "how much light the current texel emits outward". Similarly, if you prefer a more rigorous term: radiant exitance.

Between these two, we need to solve two problems:

How to calculate the outgoing light based on all incident light?
How to interpolate the low-resolution data from the Probe into higher-resolution texel information?

Lumen solves this problem in two steps:

By converting the sampled data from the Probe into spherical harmonics lighting, it is easy to integrate and calculate the outgoing light.
For a specific Texel, sample multiple Probes around it to blend the final result and reduce noise.

ℹ️

I hope to skip the mathematical details about SH and focus solely on discussing the engineering implementation of Lumen. If you are interested in the details about SH, please refer to the following:：https://3dvar.com/Green2003Spherical.pdf

Sampling Jitter

As I cannot find a better expression than the following text, please allow me to directly quote and translate the original text:

First, obtain FRadiosityTexel by using the GetRadiosityTexelFromCardTile function based on the thread ID. Then obtain the coordinates of the Probe Texel where Texel is located. Calculate the Probe coordinates based on these coordinates, and then calculate the coordinates of the right, down, and lower-right Probes, totaling four Probes. Note that when generating Probe coordinates, Jitter offset is applied in the direction of the upper-left Atlas origin, etc. In this way, four adjacent Probes are randomly selected from the 9 Probes centered at the current Probe among the 9 Texels, and each Texel may use different Probes for filtering, which further reduces noise, makes the result smoother, and improves the quality of Indirect Lighting. As shown in the figure below, the light blue Texel is the location of the Probe. For the 16 middle Texels, randomly select from the 9 yellow Probes:
From : https://zhuanlan.zhihu.com/p/522165652

Indirect lighting with Radiosity

Radiosity

groups.csail.mit.edu

Data Flow

Parallelization

Tile to Thread

Trace to Thread

Trace

Global Distance Field Trace

Encode

Trace

Mesh Card Search

Spatial Filter Probes

Spherical Harmonic Lighting

Sampling Jitter