Motion Blur

Overview

Motion blur blurs objects based on its motion. The system works by way of a full screen velocity map that is created at a reduced resolution and objects are blurred based on their contribution to this map. The image below shows a visualization of the velocity map available in-editor or while your game is running.

Therefore, the rendering of Motion Blur can be seen macroscopically as two parts:

  • Rendering Velocity Map
  • Blurring in the corresponding direction based on Velocity Map

Source of Motion Vector

Motion Vector is not equivalent to Velocity, in fact, we should think of Motion Vector as follows:

In screen space, where is the position of the current pixel in the previous frame's screen space? The difference vector between them is the Motion Vector.

Furthermore, there are actually two sources of Motion Vector:

  • Camera
  • Object itself

The movement and rotation of the camera will cause a global displacement of the pixels on the screen. And obviously, the movement of the object itself will result in the movement of the corresponding pixel.

Velocity Map Rendering

In the Base Pass of GBuffer rendering, the vertex's previous frame position is calculated based on the transformation matrix of the previous frame for each object. Then, the current frame position is compared with the previous frame position to calculate the velocity vector.

image

Visualize Velocity Map

Through the Show → Visualize → Motion Blur option, you can visualize the Velocity Map.

image
image

Motion Blur Rendering

Flatten

The calculation of Velocity Flatten is shown in the following diagram:

image
  • Converted pixel Motion Vectors from Cartesian coordinate system to polar coordinate system.
  • And through a parallel process similar to Reduce-sum, the range of polar coordinate velocities in the current local region was calculated per-tile.

Min Depth Trick

When calculating Motion Vectors, instead of directly calculating based on the current pixel position, a fast search is performed on the surrounding area of the current pixel. The pixel with the lowest depth (i.e., closest to the camera) among the surrounding pixels is used as the reference for calculation.

According to the code comments, this helps generate higher quality contours.

Implementation Details of Parallel Reduce

In the ReduceVelocityFlattenTile function, you will see the following code:

	FVelocityRange VelocityPolarRange = SetupPolarVelocityRange(VelocityPolar);
	WritePolarVelocityRangeToLDS(GroupIndex, VelocityPolarRange);
	GroupMemoryBarrierWithGroupSync();
	VelocityFlattenStep(GroupIndex,  128,  VelocityPolarRange);
	GroupMemoryBarrierWithGroupSync();
	VelocityFlattenStep(GroupIndex,  64,  VelocityPolarRange);
	GroupMemoryBarrierWithGroupSync();

	VelocityFlattenStep(GroupIndex,  32,  VelocityPolarRange);
	VelocityFlattenStep(GroupIndex,  16,  VelocityPolarRange);
	VelocityFlattenStep(GroupIndex,   8,  VelocityPolarRange);
	VelocityFlattenStep(GroupIndex,   4,  VelocityPolarRange);
	VelocityFlattenStep(GroupIndex,   2,  VelocityPolarRange);
	VelocityFlattenStep(GroupIndex,   1,  VelocityPolarRange);
	OutVelocityPolarRange = VelocityPolarRange;

I had a confusion when initially reading this piece of code:

Why do 128 and 64 require GroupMemoryBarrierWithGroupSync, but the subsequent ones don't?

I found the answer in this Slide:

Simply put:

  • If we imagine the GPU as CPU threads, then each call to VelocityFlattenStep must be accompanied by a GroupMemoryBarrierWithGroupSync to ensure that the next call to VelocityFlattenStep can obtain the latest data written by other threads.
    • This imagination is correct for the cases of 128 and 64.
  • However, when the number of threads is small, small enough to be less than a GPU Warp, at this point, the entire Warp is actually executed in SIMD form. We can imagine that all threads execute the code line by line synchronously.
  • In this case, once the VelocityFlattenStep function is executed by one thread, from the perspective of the entire Warp, everyone has finished executing.
  • Therefore, GroupMemoryBarrierWithGroupSync is no longer needed in this case.

Tile Scatter

Next, UE will attempt to scatter the Tile information of Velocity. That is, it will spread the maximum and minimum velocity within the area on both sides along the direction of velocity.

image

The problem encountered here is that the direction of the Motion Vector may be tilted and its length may be long.

If we use a quat computer shader based kernel to scatter the info, the efficiency will be very low. Therefore, UE cleverly implements this using the existed rasterization system.

image

How to calculate the scatter area?

image
  • UE will first use Instanced Rendering to draw the same number of square faces as pixels.
  • In the vertex shader, it samples the Tile it belongs to and moves the vertex position based on the maximum velocity in the Tile.
    • This effectively expands the square faces based on velocity.
  • Then, it rotates itself along the velocity direction, solving the issue of potential velocity direction skew.

So, how does UE calculate the minimum and maximum velocity within a region? UE uses Z-Test again to solve this problem:

  • First, it sets the depth test to Less and directly outputs the length of the Min Velocity as depth. This means only the value of the pixel with the lowest velocity in the overlapping region will be preserved.
  • image
  • Next, it sets the depth test to Greater and directly outputs the length of the Max Velocity as depth. However, only the B and A channels of the Render Target are written (corresponding to Max Velocity). This means only the pixel with the highest velocity in the overlapping region will record its information in the corresponding channel of Max Velocity.
  • image

Tile Classify

Since we only need to focus on the Tiles that actually have Motion Vectors, the UE chooses to merge all the Tiles in the frame. This is similar to the merging system used in the previous Lumen analysis process.

image

If you still find it difficult to imagine the merging part in the middle, it's normal.

Let's focus on only one type of Tile and assume we have only 4 threads divided into 2 groups. This diagram might help you better visualize what is happening:

image

Motion Blur Filter

image

Color Sample

  • The total number of samples is SampleCount, which is calculated based on the length of the velocity vector and will take the maximum value within the current Tile range.
  • Afterwards, the Sample Count will be divided into groups of four.
    • Each group has the same base length.
    • However, the length of the sampling vector in each group will have two random perturbations.
    • Sampling will be done in both positive and negative directions.
    • Therefore, it is four samples per group.

Compare: Sample only motion vector direction vs. Sample both positive and negative directions:

Only one direction
Only one direction
Both directions
Both directions