Global Illumination_Exponential Variance Shadow Maps(EVSM)

In recent projects, it is necessary to integrate high-quality shadows (efficiency, effect). Because the project is acyclic rendering, CSM cannot be used. However, in dynamic modeling, scenes need to be quickly added, deleted, and modified, and shadows must be regenerated. However, the simple SM + PCF cannot be used before. To meet the requirements of efficiency and effect, I researched software such as RVT and found that it uses a single-frame EVSM algorithm, so it was integrated and implemented.
If you want to understand Exponential Variance Shadow Maps (EVSM), you need to first understand the Variance Shadow Map (VSM) algorithm. If you want to understand VSM, you need to know the conventional Shadow Map (SM algorithm). This article will not go into details about SM. I will not repeat the intermediate optimization algorithm Layered Variance Shadow Maps (LVSM).
First of all, let’s take a look at the advantages of the EVSM algorithm. After all, the algorithm has advantages, so we will give priority to integrating it into the project.

Let’s first look at the effect comparison in the original EVSM article (1080 cards). For details, please see the reference article link below: https://www.martincap.io/project_detail.php?project_id=9

First of all, the advantages and disadvantages of VSM over SM are as follows:

Advantages: Faster efficiency when dealing with soft shadows; no shadow acne phenomenon
Cons: Requires additional channels to record depth squares; light leaks in areas of high variance

Secondly, the advantages and disadvantages of EVSM over VSM are as follows:

Advantages: Minimize light leakage
Disadvantages: In extreme cases, there will still be artifacts; it will take up more bandwidth (EVSM4)

First, let’s take a look at the VSM. One more thing, the Demo mainly refers to the introduction of http://mynameismjp.wordpress.com/. Some people who don’t want to see it can go out and turn right. Please stop spitting shit and thank you for your cooperation.

1. VSM

1.1 Algorithm principle

According to the SM principle, we know that when dealing with soft shadows, we usually need to use PCF to process multiple depth comparison results in an area to determine an appropriate soft and hard value.
Let’s briefly talk about the principle of PCF: project the current screen space pixel range into a shadow map, and sample the resulting area. This process is very similar to standard texture filtering. Each sample is then compared to a reference depth, yielding a binary result. Next, all depth comparisons are combined to calculate the percentage of texels in the filtered region that are closer to the reference depth. This percentage is used to attenuate the light. In general, increasing the size of the filter area can soften the edge program of shadows.

Although the quality of PCF is quite good, it requires a lot of sampling. For standard texture filtering, the surfaces at the shadow corners also require a large amount of anisotropic filtering area. In the worst case, we have to sample and compare every texel in the shadow map to calculate the light falloff for every framebuffer pixel. This process will be very slow, and there is a shadow acne phenomenon.

In response to the above problem, the VSM algorithm was developed. The main idea is to represent depth data in a method that can be linearly filtered, so that we can use algorithms and hardware suitable for color and other linear data. This algorithm is very similar to the standard shadow map algorithm, except that we write depth and depth squared into a two-component R32G32_FLOAT difference shadow map, instead of simply writing the depth into the shadow map middle. By filtering certain areas, we restored the depth distribution of the area at time M1 and time M2, where M1/M2 represents the mean data of the area, that is:

In essence, it is to do a filter on the depth and depth square map (such as box filter or Gauss filter, etc.).

Or you can directly use MSAA to process it and then use a linear sampler to directly obtain the value, and then calculate the mean (average accuracy) and variance (classic statistical formula).

We can then use this difference and apply (Chebyshev) Chebyshev’s inequality to calculate the upper bound on the probability that the current surface to be shadowed (with a depth of 1) is occluded:

The “product-difference correlation coefficient” version of Chebyshev’s inequality is only valid when t>μ. If t≤μ, then Pmax=1, which means that the surface is completely illuminated, although it is not accurate, but it can be used as it is.

That’s all the theory, it’s that simple. But there are still a few small pits in the implementation, just look at the code.

1.2 Algorithm Implementation

1.2.1 Shadow generation

Conventional shadow algorithm, without repeating details, is generated as shown in the following figure:

1.2.2 Depth and depth squared expectation calculation

A DrawCall triangle converts and stores the above depth map, and the PS code in HLSL is as follows:

struct VSOutput
{
    float4 Position : SV_Position;
    float2 TexCoord : TEXCOORD;
};
VSOutput FullScreenVS(in uint VertexID : SV_VertexID)
{
    VSOutput output;

    if(VertexID == 0)
    {
        output.Position = float4(-1.0f, 1.0f, 1.0f, 1.0f);
        output.TexCoord = float2(0.0f, 0.0f);
    }
    else if(VertexID == 1)
    {
        output.Position = float4(3.0f, 1.0f, 1.0f, 1.0f);
        output.TexCoord = float2(2.0f, 0.0f);
    }
    else
    {
        output.Position = float4(-1.0f, -3.0f, 1.0f, 1.0f);
        output.TexCoord = float2(0.0f, 2.0f);
    }

    return output;
}

float4 ConvertToVSM(in VSOutput input) : SV_Target0
{
    float sampleWeight = 1.0f / float(MSAASamples_);
    uint2 coords = uint2(input. Position. xy);


    float4 average = float4(0.0f, 0.0f, 0.0f, 0.0f);

    // Sample indices to Load() must be literal, so force unroll
    [unroll]
    for(uint i = 0; i < MSAASamples_; + + i)
    {
        // Convert to EVSM representation
        #if MSAASamples_ > 1
            float depth = ShadowMap. Load(coords, i);
        #else
            float depth = ShadowMap[coords];
        #endif
         average + = sampleWeight * float4(vsmDepth.xy, vsmDepth.xy * vsmDepth.xy);
    }
    return average.xzxz;
}

The converted texture looks like this:

In addition to the above, MSAA and linear samplers are directly used to process the depth and the mean value of the square of the depth, that is, (M1 and M2). Filtering can also be used.
You can refer to this article to introduce https://graphics.stanford.edu/~mdfisher/Shadows.html, the flow diagram is as follows:

Filter processing code can refer to:

struct VSOutput
{
    float4 Position : SV_Position;
    float2 TexCoord : TEXCOORD;
};
VSOutput FullScreenVS(in uint VertexID : SV_VertexID)
{
    VSOutput output;

    if(VertexID == 0)
    {
        output.Position = float4(-1.0f, 1.0f, 1.0f, 1.0f);
        output.TexCoord = float2(0.0f, 0.0f);
    }
    else if(VertexID == 1)
    {
        output.Position = float4(3.0f, 1.0f, 1.0f, 1.0f);
        output.TexCoord = float2(2.0f, 0.0f);
    }
    else
    {
        output.Position = float4(-1.0f, -3.0f, 1.0f, 1.0f);
        output.TexCoord = float2(0.0f, 2.0f);
    }

    return output;
}

float4 BlurSample(in float2 screenPos, in float offset, in float2 mapSize)
{
    #if Vertical_
        float2 samplePos = screenPos;
        samplePos.y = clamp(screenPos.y + offset, 0, mapSize.y);
        return VSMMap[uint2(samplePos)];
    #else
        float2 samplePos = screenPos;
        samplePos.x = clamp(screenPos.x + offset, 0, mapSize.x);
        return VSMMap[uint3(samplePos, 0)];
    #endif
}

float4 BlurVSM(in VSOutput input) : SV_Target0
{
    #if Vertical_
        float scale = abs(CascadeScale.y);
        float maxFilterSize = MaxKernelSize / abs(Cascade0Scale.y);
    #else
        float scale = abs(CascadeScale.x);
        float maxFilterSize = MaxKernelSize / abs(Cascade0Scale.x);
    #endif

    const float KernelSize = clamp(min(FilterSize, maxFilterSize) * scale, 1.0f, MaxKernelSize);
    const float Radius = KernelSize / 2.0f;

    #if GPUSceneSubmission_
        [branch]
        if(KernelSize > 1.0f)
        {
            const int SampleRadius = int(round(Radius));

            float4 sum = 0.0f;

            [loop]
            for(int i = -SampleRadius; i <= SampleRadius; + + i)
            {
                float4 sample = BlurSample(input. Position. xy, i, ShadowMapDimensions);

                sample *= saturate((Radius + 0.5f) - abs(i));

                sum + = sample;
            }

            return sum / KernelSize;
        }
        else
        {
            return BlurSample(input.Position.xy, 0, ShadowMapDimensions);
        }
    #else
        float4 sum = 0.0f;

        [unroll]
        for(int i = -SampleRadius_; i <= SampleRadius_; + + i)
        {
            float4 sample = BlurSample(input.Position.xy, i, ShadowMapDimensions);

            sample *= saturate((Radius + 0.5f) - abs(i));

            sum + = sample;
        }

        return sum / KernelSize;
    #endif
}

To summarize: since shadow maps are linearly filterable, we can use many techniques and algorithms. Most obviously, we can simply use texture refinement, trilinear and anisotropic filtering, or even multi-sample anti-aliasing (while rendering shadow maps). Compared to using standard shadow maps and constant filtering percentage-closer filtering, it can greatly improve the quality of shadow maps by itself. Of course, there are also box filtering or Gaussian filtering, etc., and the summation table (SAT) algorithm can also be used. Those who are interested can try it themselves.

Of course, in order to more accurately calculate the mean of the squared depth of the algorithm, a deviation can also be added to it:

The main purpose here is to optimize the shadow map texture pixels as a local plane distribution (relevant optimization technology in GPU Gems 3). If you are interested, explore it in depth on your own.

1.2.3 Calculating shadows

After the above processing, we can easily get the expected value of depth and depth square. The next step is to use Chebyshev’s inequality when calculating shadows:

// http://mynameismjp.wordpress.com/


// Reduce light leakage
float Linstep(float a, float b, float v)
{
    return saturate((v - a) / (b - a));
}

float ReduceLightBleeding(float pMax, float amount)
{
  // Overflow the tail of [0, amount] and linearly scale to [amount, 1].
   return Linstep(amount, 1.0f, pMax);
}

float ChebyshevUpperBound(float2 moments, float mean, float minVariance,
                          float lightBleedingReduction)
{
    // Calculate variance
    float variance = moments.y - (moments.x * moments.x);
    variance = max(variance, minVariance);//Prevent 0

    // Calculate the probability upper bound (Chebyshev's inequality)
    float d = mean - moments.x;
    float pMax = variance / (variance + (d * d));

//Reduce light leakage
    pMax = ReduceLightBleeding(pMax, lightBleedingReduction);

    // Unilateral Chebyshev processing
    return (mean <= moments.x ? 1.0f : pMax);
}

float SampleShadowMapVSM(in float3 shadowPos, in float3 shadowPosDX,
                         in float3 shadowPosDY, uint cascadeIdx)
{
    float depth = shadowPos.z;

    float2 occluder = ShadowMap.SampleGrad(VSMSampler, float3(shadowPos.xy, cascadeIdx),
                                           shadowPosDX.xy, shadowPosDY.xy).xy;

    return ChebyshevUpperBound(occluder, depth, VSMBias * 0.01, LightBleedingReduction);
}

float3 ShadowVisibility(in float3 positionWS, in float depthVS, in float nDotL, in float3 normal,
                        in uint2 screenPos)
{
float3 shadowVisibility = 1.0f;
uint cascadeIdx = NumCascades - 1;
    float3 projectionPos = mul(float4(positionWS, 1.0f), ShadowMatrix).xyz;

    //Apply offset
    float3 offset = GetShadowPosOffset(nDotL, normal) / abs(CascadeScales[cascadeIdx].z);

    // Project into shadow space
    float3 samplePos = positionWS + offset;
float3 shadowPosition = mul(float4(samplePos, 1.0f), ShadowMatrix). xyz;
    float3 shadowPosDX = ddx_fine(shadowPosition);
    float3 shadowPosDY = ddy_fine(shadowPosition);

shadowVisibility = SampleShadowCascade(shadowPosition, shadowPosDX, shadowPosDY,
                                           cascadeIdx, screenPos);

return shadowVisibility;
}

The result obtained is as follows:

Seeing that there are not many codes above, in addition to the application of Chebyshev approximation, the problem of light leakage is actually solved to a certain extent (the biggest problem of VSM is the phenomenon of light leakage), so why does the algorithm produce light leakage? Let’s take a look at it in detail (in fact, it is caused by the approximation of Chebyshev’s inequality, and it is easier to see the reason from the perspective of mathematical formulas).

Suppose three objects are marked as A, B, and C from top to bottom, and the corresponding depth values are a, b, and c. Only objects A and B are in the filtering area, and C, as a receiver, is blocked by two objects and should not be able to see the light source. We assume that the current shading point is located in the center of the filter area in C. We can get the following two moments:

Then we can calculate the mean and variance:

In the above figure, there are Δx=b?a and Δy=c?b, so the visibility function of the middle area is calculated using Chebyshev’s inequality:

Using Δy/Δx as variables, the function curve can be obtained as:

Therefore, the smaller Δy/Δx is, the closer the overall visibility is to 0.5, which leads to the occurrence of light leakage.

Therefore, the algorithm to solve the light leakage problem is the one in the above code:

// Reduce light leakage
float Linstep(float a, float b, float v)
{
    return saturate((v - a) / (b - a));
}

float ReduceLightBleeding(float pMax, float amount)
{
  // Overflow the tail of [0, amount] and linearly scale to [amount, 1].
   return Linstep(amount, 1.0f, pMax);
}

The principle is simple:Some important data observations are that if a surface at depth t completely occludes some filtered region with mean depth μ, then t > μ. Therefore (t-μ)2>0, according to Chebyshev’s inequality, Pmax<1. Simply put, a completely occluded surface with an erroneous penumbra will never achieve full brightness. So we can remove these regions by modifying Pmax so that all values less than some minimum brightness are mapped to 0, and then rescale other values so that they map to the (0,1) interval. That is the implementation in the code above.

In order to further improve the accuracy and solve the light leakage problem to a greater extent, methods such as LVSM and EVSM have been derived. Next, we will directly look at the best EVSM.

2. EVSM

2.1 Algorithm Principle

As mentioned before, the light leakage phenomenon will be particularly obvious when the ratio of Δy and Δx is very small. We can consider using some wrap on x and y to try to improve the ratio of Δy and Δx.
For example, we can use the wrapper of e^(cx) above, where c is a positive number. Then find the mean and variance of e^cx, and then use Chebyshev’s inequality to find Pmax.
In this way, the original Δy/Δx becomes e^(Δy?Δx).

After the wrapper of e^(cx), the light leakage can be effectively suppressed, but as c increases, problems will also occur in distant scenes, so a reverse suppression is needed, namely -e^(-cx).

When these two wrappers are used together, they are called EVSM4, that is, four-channel textures are required, and only the forward direction is called EVSM2. Since both e^(cx) and -e^(-cx) are monotonically increasing functions, both wrappers can use Chebyshev’s inequality, and finally take two upper limit probabilities the minimum value among them. At this time, artifacts will greatly slow down the light leakage problem as the value of c increases.

2.2 Algorithm Implementation

2.2.1 EVSM4 shadow conversion

code show as below:

static const uint SMFormat16Bit = 0;
static const uint SMFormat32Bit = 1;

float2 GetEVSMExponents(in float positiveExponent, in float negativeExponent, in uint vsmFormat)
{<!-- -->
    const float maxExponent = vsmFormat == SMFormat16Bit ? 5.54f : 42.0f;

    float2 lightSpaceExponents = float2(positiveExponent, negativeExponent);

    // Approximate to the maximum range of fp32/fp16 to prevent overflow
    return min(lightSpaceExponents, maxExponent);
}

// apply exponential warp to shadow map depth, input depth should be [0,1]
float2 WarpDepth(float depth, float2 exponents)
{<!-- -->
    // Rescale depth into [-1, 1]
    depth = 2.0f * depth - 1.0f;
    float pos = exp( exponents. x * depth);
    float neg = -exp(-exponents.y * depth);
    return float2(pos, neg);
}

float4 ConvertToVSM(in VSOutput input) : SV_Target0
{<!-- -->
    float sampleWeight = 1.0f / float(MSAASamples_);
    uint2 coords = uint2(input. Position. xy);
//40.0f,5.0f
    float2 exponents = GetEVSMExponents(PositiveExponent, NegativeExponent, SMFormat);


    float4 average = float4(0.0f, 0.0f, 0.0f, 0.0f);

    [unroll]
    for(uint i = 0; i < MSAASamples_; + + i)
    {<!-- -->
        // Convert to EVSM representation
        float depth = ShadowMap[coords];
        float2 vsmDepth = WarpDepth(depth, exponents);
        average + = sampleWeight * float4(vsmDepth.xy, vsmDepth.xy * vsmDepth.xy);
    }
    return average;
}

Original shadow map:

Converted EVSM4 shadow map:

2.2.2 Shadow calculation

static const uint SMFormat16Bit = 0;
static const uint SMFormat32Bit = 1;

float2 GetEVSMExponents(in float positiveExponent, in float negativeExponent, in uint vsmFormat)
{<!-- -->
    const float maxExponent = vsmFormat == SMFormat16Bit ? 5.54f : 42.0f;

    float2 lightSpaceExponents = float2(positiveExponent, negativeExponent);

    //Approximate to the maximum range of fp32/fp16 to prevent overflow
    return min(lightSpaceExponents, maxExponent);
}

//Apply exponential distortion to shadow map depth, input depth should be [0,1]
float2 WarpDepth(float depth, float2 exponents)
{<!-- -->
    // Rescale depth into [-1, 1]
    depth = 2.0f * depth - 1.0f;
    float pos = exp( exponents. x * depth);
    float neg = -exp(-exponents.y * depth);
    return float2(pos, neg);
}

float SampleShadowMapEVSM(in float3 shadowPos, in float3 shadowPosDX,
                          in float3 shadowPosDY)
{<!-- -->
    float2 exponents = GetEVSMExponents(PositiveExponent, NegativeExponent, SMFormat);
    float2 warpedDepth = WarpDepth(shadowPos.z, exponents);

    float4 occluder = ShadowMap.SampleGrad(VSMSampler, float3(shadowPos.xy, 0.),
                                            shadowPosDX.xy, shadowPosDY.xy);

    // Sampling depth warping
    float2 depthScale = VSMBias * 0.01f * exponents * warpedDepth;
    float2 minVariance = depthScale * depthScale;

    #if ShadowMode_ == ShadowModeEVSM4_
        float posContrib = ChebyshevUpperBound(occluder.xz, warpedDepth.x, minVariance.x, LightBleedingReduction);
        float negContrib = ChebyshevUpperBound(occluder.yw, warpedDepth.y, minVariance.y, LightBleedingReduction);
        return min(posContrib, negContrib);
    #else
        // Positive only
        return ChebyshevUpperBound(occluder.xy, warpedDepth.x, minVariance.x, LightBleedingReduction);
    #endif
}


float ShadowVisibility(in float3 positionWS, in float depthVS, in float nDotL, in float3 normal,
                        in uint2 screenPos)
{<!-- -->
float shadowVisibility = 1.0f;
    // Project into shadow space
    float3 samplePos = positionWS;
float3 shadowPosition = mul(float4(samplePos, 1.0f), ShadowMatrix).xyz;
    float3 shadowPosDX = ddx_fine(shadowPosition);
    float3 shadowPosDY = ddy_fine(shadowPosition);

shadowVisibility = SampleShadowMapEVSM(shadowPosition, shadowPosDX, shadowPosDY, screenPos);

return shadowVisibility;
}

The generation and if are as follows:

2.2.3 Gaussian filter (soft shadow generation)

Finally, no matter whether it is VSM, LVSM, ESM or EVSM, there is no way to directly produce soft shadows, but it is already very good compared to the result of SM single-point sampling that is directly non-0 or 1, as shown in the following figure:

Single SampleCmpLevelZero sampling of shadow edges:

EVSM single sample shadow edge:

Therefore, a Gaussian filter is generally required to blur the edge area and achieve soft shadows.

Simply put a 5*5 Gaussian algorithm as follows:

// Kernel from: https://computergraphics.stackexchange.com/questions/39/how-is-gaussian-blur-implemented
// I presume it is approximate using the Pascal pyramid
//const float blurKernel[25] = float[](
// 1.0 / 256.0, 4.0 / 256.0, 6.0 / 256.0, 4.0 / 256.0, 1.0 / 256.0,
// 4.0 / 256.0, 16.0 / 256.0, 24.0 / 256.0, 16.0 / 256.0, 4.0 / 256.0,
// 6.0 / 256.0, 24.0 / 256.0, 36.0 / 256.0, 24.0 / 256.0, 6.0 / 256.0,
// 4.0 / 256.0, 16.0 / 256.0, 24.0 / 256.0, 16.0 / 256.0, 4.0 / 256.0,
// 1.0 / 256.0, 4.0 / 256.0, 6.0 / 256.0, 4.0 / 256.0, 1.0 / 256.0
//);

// Kernel generated at: http://dev.theomader.com/gaussian-kernel-calculator/
const float blurKernel[25] = float[](
0.023528, 0.033969, 0.038393, 0.033969, 0.023528,
0.033969, 0.049045, 0.055432, 0.049045, 0.033969,
0.038393, 0.055432, 0.062651, 0.055432, 0.038393,
0.033969, 0.049045, 0.055432, 0.049045, 0.033969,
0.023528, 0.033969, 0.038393, 0.033969, 0.023528
);


float4 main(in VSOutput input) : SV_Target0
{<!-- -->
float3 finalColor = float3 (0.0);
float2 u_TexelSize=input.texelSize;
for (int x = -2; x <= 2; x + + ) {<!-- -->
for (int y = -2; y <= 2; y + + ) {<!-- -->
finalColor + = InputTexture.Sample(sampler, float2(v_TexCoords.x + u_TexelSize.x * x, v_TexCoords.y + u_TexelSize.y * y)).rgb * blurKernel[x + 2 + (y + 2) * 5];
}
}
FragColor = float4 (finalColor, 1.0);
}

So far, as for how to apply it to CSM (generally we call it cascaded shadow, there is also a shadow algorithm called Convolution Shadow Maps, also commonly known as CSM, different people see different things), it is very simple, just execute EVSM for each level of shadow.

Reference article:
http://developer.download.nvidia.com/presentations/2008/GDC/GDC08_SoftShadowMapping.pdf
https://learn.microsoft.com/en-us/windows/win32/dxtecharts/cascaded-shadow-maps?redirectedfrom=MSDN
https://graphics.stanford.edu/~mdfisher/Shadows.html
https://www.martincap.io/project_detail.php?project_id=9
https://graphics.stanford.edu/~mdfisher/Shadows.html