Implementing Ray Traced Global Illumination

Intro

This article provides a reference on implementing a dynamic single-bounce global illumination system similar to what you can find in Metro Exodus[4], first iterations of RTGI in Unreal Engine or Unity’s HDRP.

This is edition 1, very crude implementation of first-bounce diffuse-only global illumination. This article is to be updated while I am improving the tech.

Definitions

Ray Traced Global Illumination - a way to calculate an ambient term of diffuse reflected light. Do not confuse with RTXGI, which is a probe-based solution. Diffuse reflection - opposing to specular reflections, reflection of light scattered in many directions. Lambertian reflection - ideal diffuse reflection.

Implementation details

After calculating first-bounce visibility from a GBuffer, having normals and world positions.

  1. Distribute rays in a hemisphere in a cosine-weighted way around normal
  2. Shoot rays and calculate bounce lighting in a hit point using Lambertian [[BRDF]]

The resulting image will be noisy due to a lack of samples, so we need to compensate for that with denoising. The [[SVGF]] algorithm has a combination of spatial blur, temporal reprojection and accumulation, guided by the variance and a number of accumulated samples in an image.

Uniform distribution

Now, as for distribution, we can sample points on a hemisphere uniformly, the GLSL code will look like this.

// result is given in a world space in a hemisphere around normal
// xi is a uniform random number set in [0, 1] range
// PDF=1
vec3 RandomDirectionHemisphere(vec2 xi, vec3 normal)
{
    // For a random diffuse bounce direction, we follow the approach of
    // Ray Tracing in One Weekend, and generate a random point on a sphere
    // of radius 1 centered at the normal. This uses the random_unit_vector
    // function from chapter 8.5:
    float theta    = 6.2831853 * xi.x;  // Random in [0, 2pi]
    float u        = 2.0 * xi.y - 1.0;  // Random in [-1, 1]
    float r        = sqrt(1.0 - u * u);
    vec3 direction = normal + vec3(r * cos(theta), r * sin(theta), u);
    return normalize(direction);
}

Cosine distribution

But because of the Lambertian cosine law, directions that are aligned with normal are more important because that’s where most of the light will come from (assumption). As angle between normal and a light direction increases, intensity of light per unit area decreases. That can be used to generate more samples closer to the normal according to the cosine law.

// Samples a direction within a hemisphere oriented along +Z axis with a cosine-weighted distribution
// Source: "Sampling Transformations Zoo" in Ray Tracing Gems by Shirley et al.
// https://github.com/boksajak/brdf/blob/master/brdf.h
// u is a a uniform random number between 0 and 1
// note: result is in a local space and will need to be transformed into world space afterwards
vec3 RandomDirectionHemisphereCosineLocal(vec2 u, out float pdf)
{
    u = clamp(u, 0.001, 0.999);

    float a = sqrt(u.x);
    float b = TWO_PI * u.y;  

    vec3 result = vec3(a * cos(b), a * sin(b), sqrt(1.0f - u.x));
    pdf = result.z * ONE_OVER_PI;
    return result;
}

Here points, as well as probabilities, are distributed closer to the normal, which is the direction of our interest.

For more details and comparisons, see [5].

Local to world space transformation

Cosine distribution gives points in local space with Z coordinate pointing in a direction of shading normal. To convert points from this local space to directions in the world space that we can trace rays towards, we can build a change-of-basis matrix similar to a TBN matrix for normal mapping, but because we probably don’t have access to geometry tangents (the methods are described in [1] and [2]). Code to build an orthonormal basis from one direction is as follows:

mat3 ComputeBasisFromVector(vec3 normal)
{
    // Using right-hand coord
    const vec3 up = abs(normal.y) < 0.999 ? vec3(0,1,0) : vec3(1,0,0);
    const vec3 xAxis = normalize(cross(up, normal));
    const vec3 yAxis = cross(normal, xAxis);
    const vec3 zAxis = normal;

    return mat3(xAxis, yAxis, zAxis);
}

That will be a basis with Z coordinate aligned with the shading normal, and X and Y coordinates being orthogonal to it and each other. $$ B=\begin{bmatrix} X_x & Y_x & N_x \\ X_y & Y_y & N_y \\ X_z & Y_z & N_z \end{bmatrix} $$ And transforming sampled points will simply be: $$ dir=B*SampleHemisphereCosineLocal(xi) $$

vec3 RandomDirectionHemisphereCosineGlobal(vec2 u, vec3 normal, out float pdf)
{
    const vec3 localdir = RandomDirectionHemisphereCosineLocal(u, pdf);
    const mat3 transform = ComputeBasisFromVector(normal);
    return transform * localdir;
}

Tracing rays

Now, as we have a way to create samples for our irradiance computation at any given point, the main part begins. In this part we need to collect samples about indirect second-bounce lighting and store it in a buffer (RGBA16F format).

Setup

The ray generation shader that is dispatched over every pixel of the screen. As an initial setup we discard any pixel on the sky:

const ivec2 Pixel = ivec2(gl_LaunchIDEXT.xy);
float Depth = texelFetch(GBufferDepth, Pixel, 0).x;
// do not trace from sky
if (abs(Depth) < EPSILON || abs(Depth - 1) < EPSILON)
{
	imageStore(Result, ivec2(Pixel), vec4(0));
	return;
}

For pixels that passed this check, we select first-hit information from previously rasterised GBuffer.

vec3 Origin = texelFetch(GBufferWorldPosition, Pixel, 0).xyz;
vec3 Normal = texelFetch(GBufferNormals, Pixel, 0).xyz;
vec3 Albedo = texelFetch(GBufferAlbedo, Pixel, 0).rgb;

Note: normal here is stored in -1 to 1 range already, so there is no need for conversion.

BRDF sampling

As we are calculating diffuse bounced lighting, Lambertian model is enough. Potentially you can use Burley’s diffuse model with retro-reflections, but here we stick with Lambert. As discussed in sections above, we generate a random cosine-weighted sample in a hemisphere around the normal, and calculate a PDF for it.

float SamplePdf;
vec3 SampleDir = RandomDirectionHemisphereCosine(Xi, Normal, SamplePdf);

Note: that function already takes care of transforming sample from local space to world space.

Tracing

After that there is all needed information to perform tracing.

// bias will offset beginning of the ray a little bit along it, so it doesn't
// self-intersect with the hit point geometry
float OriginBias = 0.01;
traceRayEXT(TLAS, gl_RayFlagsOpaqueEXT, 0xFF, 0, 0, 0, Origin, OriginBias, SampleDir, MaxDistance, 0);

Here Vulkan Ray Tracing was used, but tracing rays at this stage doesn’t have to rely on hardware RT capabilities. As another option, tracing rays against a scene approximation can be used, usually a two-level SDF, where first level has a lower resolution distance field covering the entire scene (essentially working as an acceleration structure) and second per-instance level has higher resolution for a specific mesh, giving more accurate results. For more information on software raytracing, look up UE5 Lumen, announced AMD Brix GI, Lux GI (https://github.com/flwmxd/LuxGI) and Flax Engine GI (https://docs.flaxengine.com/manual/graphics/lighting/gi/realtime.html).

Hit point lighting computations

Assuming that the hit shader returns all the required surface properties and a miss shader returns sky radiance, we perform direct lighting computations for an indirect hit point as usual:

  1. Compute direct lighting from light sources
  2. Project shadowmaps or cast additional shadow rays

Some pseudo shader code:

vec3 Radiance = vec3(0);
if (Payload.Hit)
{
	for (each light source)
	{
		vec3 Lighting = EvaluateDirectLightingBRDF(Payload.Material, light);
		Radiance += Lighting * LightShadowing(light);
	}
}
else
{
	Radiance += Payload.SkyColour;
}

This is where third bounce can also be computed, or sampled from the lightmap or other sort of world-space irradiance cache. But to keep things simple for now, only first bounce is accounted for.

Computing irradiance

Now as we get second-bounce sample, we can compute irradiance for our point of interest and finally store the result. As usual, based on the Rendering Equation[3], we multiply incident radiance by a result of BRDF and a Lambertian cosine term (NdotL). Notice the division by SamplePdf, it is here because previously we’ve used cosine weighted sampling, which is a type of importance sampling.

To put it simply, it means that every time the random decision on sampling was made, we need to weight this sample by it’s probability.

float NdotL = max(0, dot(Normal, SampleDir));
vec3 Irradiance = Radiance * LambertDiffuseBRDF(vec3(1)) * NdotL / SamplePdf;
imageStore(Result, ivec2(Pixel), vec4(Irradiance, RayDist));

Notice that for albedo term pure white colour is used. So the indirect diffuse is calculated as if everything was white in the first place, meaning that albedo should be mixed in during the composition pass:

LightingSum += Albedo * imageLoad(GIBuffer, ivec2(gl_GlobalInvocationID.xy)).rgb;

That greatly reduces variance in the image and makes temporal denoising much more stable. It also helps to maintain details in shadow after spatial denoising, as the image will essentially be blurred, making indirectly lit surfaces look worse than they should be.

And here is the result.

Denoising

A common approach is to use some sort of a spatio-temporal denoiser, where “spatio” means image-space smart blur and “temporal” means accumulating results between frames. Approach used here is inspired by work done in Metro Exodus[4] and an original SVGF paper[6], also refer to an example in DirectX Samples repository, which has an SVGF implementation[8].

Temporal denoising

This process is essentially applying TAA with some additional metrics to the input noisy image. To do this, some additional data is needed:

  1. Irradiance history buffer - result of GI computations from the previous frame, can be fully black for the first frame
  2. Depth buffer from the rasterisation pass, as well as depth history from the previous frame
  3. Velocity buffer for reprojection
  4. SampleCount buffer, which stores the number of samples accumulated in history, full of zeroes in the first frame
  5. Variance and variance history buffers for a full SVGF

Reprojection

The process of reprojections means trying to correlate samples (pixels) from the previous frame to samples in the current frame. Refer to [8,9,10] for more details.

To compute velocities as a part of GBuffer, during the rasterisation pass, both current and previous clip-space position should be computed (for skinned meshes it means having both current and previous frame bones), and then velocities are computed in the fragment shader as follows:

RT4 = vec2(InClipspacePos.xy/InClipspacePos.w - InClipspacePosPrev.xy/InClipspacePosPrev.w);

History reprojection is then about using these velocities to compare values from the previous frame to the current frame, and will give movements of GI samples both as object and camera move.

Usually, TAA techniques use per-frame subpixel jittering, but because samples here are computed with blue noise that changes every frame, it’s not necessary.

The position of the previous corresponding sample in the history texture is computed like this:

float2 Velocity = floor(g_velocity[pixel] / 2 * Params.Size + 0.5);
int2 PrevCoords = int2(pixel - Velocity);
int2 PrevCoordsClamped = clamp(PrevCoords, int2(0,0), Params.Size - 1);

The reprojection of depth is then a series of transformations: reconstruct world-space position from screen UV and depth, then project that world-space position for the previous camera. That way, we get depth of the current sample as if it was rendered from the previous camera.

const float2 uv = (pixel + 0.5f) / float2(Params.Size);
const float2 ndc = 2.0f * float2(uv.x, 1.0f - uv.y) - 1.0f;

float4 WorldSpace = mul(Params.ViewProjectionInv, float4(ndc, Depth, 1.0f));
WorldSpace /= WorldSpace.w;
float4 PrevNdc = mul(Params.PrevViewProjection, WorldSpace);
PrevNdc /= PrevNdc.w;

Rejection

To fight ghosting, some metric for rejection invalid history samples must be added. For that, let’s use the previous depth and the current reprojected depth to compare them in the same setting. Don’t just compare previous depth to the current depth, do reprojection as otherwise it will break with camera movement and that’s what we want to avoid.

float GetLinearDepth(int2 dtid, float depth)
{
    const float2 uv = (dtid + 0.5f) / float2(Params.Size);
    const float2 ndc = 2.0f * float2(uv.x, 1.0f - uv.y) - 1.0f
    float4 Projected = mul(Params.ProjectionInv, float4(ndc, depth, 1));
    return abs(Projected.z / Projected.w);
}
...
float ReprojectedLinearDepth = GetLinearDepth(pixel, PrevNdc.z);
float PrevLinearDepth = GetLinearDepth(PrevCoords, g_depth_history[PrevCoords]);
const float DepthDifference = abs(PrevLinearDepth - ReprojectedLinearDepth) / ReprojectedLinearDepth;

if (DepthDifference >= 1e-2f) // reprojection failed! reject this sample!
...

For pixels where history is rejected, we also want to boost variance for the spatial denoiser pass to blur it more aggressively, as that is the region with just 1 sample in the current frame.

Accumulation

If pixel is rejected, then the new sample overrides whatever was there before, and if not, then some bookkeeping of the number of samples must be added, and history must be blended with the current sample based on that.

// g_sample_count - RW texture for bookkeeping
float SampleCount = g_sample_count[PrevCoordsClamped];
...
if (DepthDifference >= 1e-2f)
{
	SampleCount = 0;
} else
{
	SampleCount += 1;
}
...

const int MaxSamples = 100;
SampleCount = clamp(SampleCount, 0, MaxSamples); // limit temporal accumulation
float3 HistoryValue = g_history[PrevCoordsClamped]; // get previous sample

// blend sample with history
float HistoryWeight = 1 / (SampleCount + 1);
float3 Result = lerp(HistoryValue, g_input[dtid], HistoryWeight);

g_output[dtid] = float4(Result, 1);
g_sample_count[dtid] = SampleCount;

Here is the result of temporal denoising. It already looks much better in comparison to noisy image. Note: jagged white lines in corners are probably due to my 3d modelling skills, doesn’t seem to be recurrent in other test levels.

Spatial denoising

The second part of SVGF is iteratively blurring the image several times to get rid of the remaining noise. This image will be the final result and will be used as a history for the next frame, meaning that temporal accumulation of the next frame will already get less noisy result. This process is named “recurrent denoising” in [4].

The problem here is smoothing geometry edges. In a sample scene it would mean that lighting from a backwall and spheres will leak onto each other, creating a blurry mess.

For that, A-Trous edge-avoiding filter[7] can help. It defines a multi-step filter with increasingly larger kernel, but instead of making more and more neighbourhood samples at every iteration, it skips some of them, while giving results similar to ones having larger kernels. [7], figure 3

The basic weight is based on distance from the center pixel to the sampled neighbour, which in the original paper is based on a B3 spline interpolation (1/16, 1/4, 3/8, 1/4, 1/16), but I’ve found Gaussian kernel to work better.

To account for edge-avoiding, every sample can be compared to the central one and weights can be multiplied.

Visualisation of the process found for Bilateral filter on Google. That’s what edge-stopping function essentially does, it cuts everything that is considered beyond the edge.

The edge-avoiding function for depth (and luminance, which this example doesn’t use) itself looks like this[7] $$ w_{rt}(p,q)=e^{(-\frac{||I_p-I_q||}{\sigma^2})} $$

Which translates to code as follows:

#define DEPTH_SIGMA 512.0

float GetEdgeStoppingDepthWeight(float center_depth, float neighbor_depth)
{
	return exp(-abs(center_depth - neighbor_depth) * center_depth * DEPTH_SIGMA);
}

But for normals, a modified function described in [4] will be used:

#define NORMAL_SIGMA 32.0

float GetEdgeStoppingNormalWeight(float3 normal_p, float3 normal_q)
{
	return pow(saturate(dot(normal_p, normal_q)), NORMAL_SIGMA);
}

Now combining it all with the kernel and accounting for sky pixels that should be ignored, we end up with something like this:

static const float GaussKernel5[5] = { 0.0614, 0.2448, 0.3877, 0.2448, 0.0614 };
int KernelRange = 2; // 5x5
...

float3 Result = float3(0,0,0);
float TotalWeight = 0;

float CenterDepth = g_depth_buffer[pixel];
float3 CenterNormal = g_normals[pixel];

if (CenterDepth <= 0.0001 || CenterDepth >= 0.9999)
{
	// sky
	g_output[pixel] = float4(0,0,0,1);
	return;
}

for (int i = -KernelRange; i <= KernelRange; i++)
{
    for (int j = -KernelRange; j <= KernelRange; j++)
    {
	    // will sample every StepSize pixel around the center
        int2 Step = int2(i, j) * Params.StepSize;
        int2 Coords = clamp(pixel + Step, int2(0,0), Params.Size - 1);

        float Weight = GaussKernel5[i+KernelRange] * GaussKernel5[j+KernelRange];

        float Depth = g_depth_buffer[Coords];
        Weight *= GetEdgeStoppingDepthWeight(CenterDepth, Depth);

        float3 Normal = g_normals[Coords];
        Weight *= GetEdgeStoppingNormalWeight(CenterNormal, Normal);

        float SkyWeight = (Depth >= 0.9999 || Depth <= 0.0001) ? 0 : 1;
        Weight *= SkyWeight;

        Result += g_input[Coords] * Weight;
        TotalWeight += Weight;
    }
}
Result /= TotalWeight;

The edge-avoiding filter with hand-tuned sigma values looks like this. Not ideal, but will do the job.

And here is the result of both spatial and temporal denoising combined.

Another important aspect in SVGF is Variance-Guiding, which means applying more spatial denoising in areas of high variance, which are usually new/disoccluded ones.

The right solution would be to estimate variance from the neighbourhood pixels during temporal reprojection pass, where additional heuristics can be put, which is useful for denoising reflections and shadows.

Results and examples

Showcase of how it all looks like.

For comparison, here what it would look like without any global illumination or ambient term:

Future work and potential improvements

  1. Estimating second (and further) bounce of light with a world-space irradiance cache [12]. Potentially can use several cascades of DDGI around the camera. May be used for disoccluded areas as well.
  2. Using radiance cache for ray guiding in directions of interest.
  3. Better denoising: experiments with ReBLUR, neural denoisers
  4. Deferred hit-positions lighting, as described in Metro[4]
  5. Encode resulting irradiance as L1 spherical harmonic as in Metro[4], this will also help with metals. So instead of storing irradiance as a single colour, it would be encoded as L1 SH that gives some directionality and light distribution over a hemisphere, making it possible to sample indirect specular.
  6. Ray sorting [8, 11]. Essentially pre-generating rays from the GBuffer, binning them so that every bin has rays starting from nearly the same origin and are going in nearly same direction, making BVH traversal more coherent and fast.
  7. ReSTIR for the first bounce, it should help to greatly reduce variance and performance cost, while improving quality [13].

Visualisations

Sampling visualisations are made with Python in JupyterLab, here is a notebook with all of the code:

Further reading

For better understanding of BRDFs and Importance Sampling, highly recommend these works:

TAA:

Projects implementing realtime GI:

References

Ray Tracing Gems, Sampling Transformations Zoo, PicaPica