Screen Probe Placement and Importance Sampling

Before discussing this section, it is important to note that Lumen has a very wide range of application scenarios: including outdoor scenes with extremely long distances and indoor scenes that require more lighting details. Therefore, the solutions provided by Lumen should be able to better handle both of these situations at the same time.

So before designing the Lumen solution, the following two options were considered:

Irradiance Volume

Lighting calculation is done per-probe and then interpolated in screen space pixels.

While this has a low cost, it also leads to light leakage and lower spatial resolution. The challenge of arranging probes in space has also become a very challenging problem.

Trace From Pixel + Screen Space Denoiser

This approach performs tracing on a per-pixel basis. Therefore, the consumption is related to the resolution.

Lumen aims to strike a balance between the two, Therefore:

  • Sampling is per-pixel (with adaptive downsampling to lower resolution)
  • Then interpolated to all pixels

However, at the same time, this information is cached as a Probe in screen space to reduce computational cost. This approach is similar to the Probe used by Irradiance Volume, except that the latter uses a world space Probe.

Lumen considers objects within the screen to be more important, and therefore strives to prioritize increasing the sampling resolution within the screen as much as possible.

Background: Sampling and Distributions

This question seems a bit strange here. But I think my readers may have the same confusion as me when learning.

icon
Why do we need to generate samples? What is mapping between distributions? Why is this problem so important?

If the reader is already familiar with this part of knowledge, please skip this section directly. But if the reader is not familiar with sampling related knowledge (for example, maybe you are not very familiar with rendering, but still interested in it), I can provide a relatively rough explanation.

image

Imagine you are in a game like Xcom2, responsible for commanding your teammates to defend a stronghold. You cannot fully predict where the enemy will come from, and the number of your teammates determines that it is unlikely for them to monitor every direction.

What you can do first is:

Throw dice for each teammate to randomly determine the direction they need to monitor.

So how to measure your commander's level? The number of your teammates is fixed, but a high-level commander will be able to maximize their abilities based on the information on the battlefield:

Although the random results of the dice are uniformly distributed, you can focus on defending specific areas by designing how to map the dice rolls to specific directions.

In Tracing, we also face similar problems:

We cannot emit countless rays to the entire hemisphere space. In fact, the number of rays we can emit is often very limited.

It is very important to make the most of our prior knowledge and emit rays in the right directions.

The importance of sampling methods is demonstrated in the following image. Each pixel in both images was traced 512 times using Ray Tracing. However, the image on the right utilized a more scientific importance sampling method, resulting in significant improvement in the final result compared to the image on the left.

Prior knowledge for improving quality can come from multiple sources:

  • Historical information, such as the ambient light of the previous frame
  • Surface material information, such as the distribution of BRDF

Lumen uses such methods in multiple places, improving the final result with very limited sampling resources.

This is particularly prominent in this chapter.

Probe

image

We cannot calculate high-precision lighting for every pixel on the screen, and considering cost-effectiveness, it would also be a waste of resources to do so.

Therefore, we hope to select representative positions from these pixels and only perform calculations on these positions. We estimate the lighting results for the pixels between these positions through interpolation.

Probe Encoding

So, what is a Probe? A Probe actually consists of 2 types of information: position and the carried lighting data.

The position of a Probe includes two scenarios: one is the Grid Probe, which aligns with the screen resolution and is uniformly distributed. The positions of these Probes are aligned with pixels and fixed. The other scenario is the Adaptive Probe, which places additional Probes between the cells of the Grid when necessary. This will be discussed in detail in the following sections.

The basic method of encoding lighting information in Probes is octahedral mapping.

@jakub has provided a fantastic visualization, so please allow me to quote his work here:

Please note that the above content is just a rough description of the concept of "Probe". In the following content, we will go into further details, so I must remind you in advance: Unreal Engine does not simply use octahedral mapping for encoding, and you will find the reason why I say this in the next section.

Adaptive Probe Placement

ℹ️
If you want to read more about this part, check this video
image

If you observe the above Slide carefully, you will notice that in addition to the Probes arranged in a regular Grid format, there are also some special Probes placed inside the cells of the Grid. These are called Adaptive Probes.

These Adaptive Probes can overcome the problem of insufficient resolution of indirect lighting caused by screen space downsampling.

When to place Adaptive Probe

Before Probe Gather, there will be two passes of Adaptive Probe placement with different downsampling rates.

image

For each Pass, a judgment will be made on the depth values around the (downsampled) pixels to determine whether to add an Adaptive Probe. This includes two levels:

  • For ScreenDepth, the distance between the depth values of the four corner pixels of the current pixel and the current pixel plane is judged to determine whether there are pixels far away from the current plane. If so, proceed to the next step.
  • The two pixels on the left have different depth, so the green pixel needs to place an additional adaptive probe
    The two pixels on the left have different depth, so the green pixel needs to place an additional adaptive probe
  • At this point, it is possible that other threads have already placed a Probe because the Compute Shader is a parallel computation. Therefore, traverse the list of completed Adaptive Probes related to the current location and check whether a Probe has already been placed. If not, an Adaptive Probe needs to be placed.

How to store Adaptive Probe

Unlike the ordinary Screen Space Probecai placed at equidistant intervals, the storage of Adaptive Probe faces some new issues:

  • The quantity of Adaptive Probe is unknown.
  • Compared to Screen Space Probe, the quantity of Adaptive Probe is generally sparse.

Therefore, it is necessary to design a specific storage method for Adaptive Probe.

image

As stated in the official description, the data of Adaptive Probe is stored in the buffer of a regular Screen Space Probe, but the region is located below the area of a standard Probe that is aligned with the screen pixels.

The storage of Adaptive Probe is similar to the idea of OIT (Order-Independent Transparency), which includes a Header Texture and a shared linear allocated Buffer used to record specific Probe information.

However, instead of using a linked list to record nodes, Lumen uses a 2D BufferTexture for the recording.

The data structure relationship is as follows:

image
This is some information left for the source code readers.

Visualize

If you want a more intuitive understanding of this section, here are some suggestions:

Modify some defined macros in Engine\Shaders\Private\Lumen\LumenScreenProbeGather.usf, for example, change the

#define DEBUG_VISUALIZE_SCREEN_PROBE_PLACEMENT 0

To

#define DEBUG_VISUALIZE_SCREEN_PROBE_PLACEMENT 1

You can allow visualizing Screen Probe like the official Slide.

image

Product Importance Sampling

Let's first clarify the current situation:

  • Compared to Tracing that aligns perfectly with screen pixels, the current Trace units have been reduced to a smaller number of Probes.
  • This allows us to achieve the following effect: multiple Tracings on the same Probe. Furthermore, we can make more efficient use of the Tracing on the same Probe by removing Rays that are pre-determined as invalid and using these resources for oversampling in "more important" areas.

Imagine this scenario:

  • Equipping every soldier in a trench with a rifle and having them shoot straight ahead.
  • Equipping only a few key soldiers in the trench with machine guns and allowing them to turn towards critical directions for shooting.

This is essentially the basic idea behind Product Importance Sampling.

image
We’re operating in a downsampled space, so we can afford to launch a whole threadgroup per probe to do better sampling. Product Importance Sampling is normally only possible in offline rendering, but we can do it in real-time.

Product Importance Sampling is better than importance sampling only the BRDF or the Lighting, and it’s better than Multiple Importance Sampling, which throws away tracing work when the direction had a low weight in the other distribution.

On the left we have the BRDF for a probe on a wall, where only the directions in a single hemisphere matter. In the middle we know the incoming lighting from the previous frame, which tells us that most of the lighting is coming from these two directions. We then reassign the useless rays into the directions that are the most important in the product.

image
ℹ️
Tips: if you want to see the same visualization, use r.Lumen.ScreenProbeGather.VisualizeTraces 1

The overall calculation relationship diagram is as follows:

image

BRDF PDF

ℹ️
Engine\Shaders\Private\Lumen\LumenScreenProbeImportanceSampling.usf : ScreenProbeComputeBRDFProbabilityDensityFunctionCS

At this stage, Lumen reads material-related information from GBuffer and calculates the DiffuseTransfer information of BRDF encoded in SH3 based on the material information.

  • Because it is DiffuseTransfer, most materials use very simple calculations.
  • ℹ️
    Engine\Shaders\Private\SHCommon.ush: CalcDiffuseTransferSH3
    If you want to check the source
  • For special materials such as double-sided vegetation, SSS material, and hair, special treatment is applied.

Multi-sampling for each probe

For each Probe, Lumen samples a series of surrounding sample points to smooth the results of BRDF, avoiding noise caused by abrupt value changes.

Thread Mapping and Data Store

image

Lumen parallelizes the computation process as much as possible, including:

  • Sampling materials and calculating spherical harmonics (SH) in parallel.
  • Only recording valid SH values.
  • Calculating the average of valid SH values through parallel Reduce Sum.
  • Writing the final result into global memory, with each value of SH3 being written in parallel.

Lighting PDF

Let's summarize what needs to be achieved in this step:

  • We need to obtain the probability density of illumination in the hemisphere space.

In simpler terms, we need to know which directions of illumination are more important.

However, we encounter a chicken-and-egg problem:

  • In order to better generate Trace rays, we need to know which directions of illumination are more important.
  • To determine which directions of illumination are more important, we must first trace the surrounding environment.
image

The solution is really simple:

just reuse the history data.

ℹ️
ScreenProbeComputeLightingProbabilityDensityFunctionCS

History Reprojection

I would like to explain this content by starting from simple to complex, so please don't say "Oh no, it's not like that!" at the beginning, because I will add more details later on.

The following image shows the most basic operation process:

image

Please note that HistoryData is stored on a per-Probe basis. Lumen here goes through a series of coordinate conversions to find the nearby 2x2 Probes and calculate the final Texel coordinates that need to be sampled. These are represented in orange in the image.

In this step, in order to avoid the influence of irrelevant samples, Probes with significantly different distances from Plane are also rejected.

Jitter

ℹ️
Engine\Shaders\Private\Lumen\LumenScreenProbeCommon.ush: GetScreenTileCoordFromScreenUV

To further increase the smoothness of sampling, Lumen uses a special Jitter technique.

Note: It is not possible to directly apply Jitter to the pixel coordinates of the samples, as this would result in incorrect results.

image

Therefore, the actual coordinates that can be jittered are those of the Probe, not the pixel coordinates.

In reality, the jittering process is similar to the image below:

image
  • The red and blue colors in the image represent different pixels of the same Probe being processed, corresponding to different sampling directions.
  • Due to the presence of Jitter, the red and blue pixels have selected different Probes.
  • However, whether it is a red or blue pixel, their internal coordinates within the sampled Probe are consistent, which means that the arrow direction corresponding to the sampling point is consistent, ensuring the effectiveness of the sampling.

What should I do if there is no historical information?

In this case, data will be sampled from the Radiance Cache.

Information related to Radiance Cache will be discussed later on.

ℹ️
In the actual rendering pass, the update of RadianceCache happens before this step (Light PDF). However, explaining it in this order might be more confusing, so we will skip this part for now.

Generate Rays

image

In this step, Lumen combines the PDF from BRDF (encoded as SH) with the PDF from lighting to generate the information for the Ray used in the following Trace. It is worth noting that this step achieves the reutilization of the culled direction of Ray.

image

Each ThreadGroup is mapped to a Probe, and each thread in the group is used to process calculations within the Probe.

Please note that the responsibilities assigned to threads may vary depending on the stage of computation.

PDF Calculation

In this stage, each Thread corresponds to each Texel within the Texture range belonging to the Probe, and also corresponds to a direction in the octahedral encoding.

It is responsible for dotting its own direction with the BRDF PDF, then sampling the Light PDF, and finally obtaining the PDF of its own direction.

The result is written into a linear Buffer, because all subsequent processing will be linear.

PDF Culling

In the official explanation video, a detail was mentioned that due to the Light PDF being estimated, its stability is poor and its credibility is not as good as the BRDF PDF. Therefore, there is a detail for improving quality:

  • As long as the PDF of BRDF is greater than MinPDFToTrace, this Ray will be kept and not be culled.
If you want to read the code

Sort

Next, we enter the sorting stage of the PDF, which can be seen as a highly parallelized sorting algorithm.

Each thread is responsible for one Ray in the array. After extracting the PDF, it is compared with the entire array to calculate how many elements have a smaller PDF than itself, in order to determine its index after sorting. Then, it is written into the corresponding element at that index.

Refine (Subdivide)

Next, we enter the subdivision stage. The subdivision stage is actually divided into two steps, and the responsibilities of the Threads are different in each step.

Allow me to first explain in a non-parallelized manner:

A relatively easy-to-imagine way is to start searching from both ends towards the middle. Since the array elements are sorted in ascending order according to PDF, if the PDF value of the element on the left is less than MinPDFToTrace, it means that the Ray on the right can be subdivided to the left.

image
  • When searching, jump 3 spaces to the left and 1 space to the right.
  • Once it is determined that further subdivision is possible, replace the left 3 🟦 elements with the information subdivided by the right 🟩 element, while the right 🟩 element is used to save the information of the upper left element after subdivision.

Next, parallelize this process, which is the work process of the Refine section in the previous flowchart.