Shadow Maps Don’t (Always) Need Mips

In my last post about pixel quad utilization, I stated that shadow maps don’t need mips. There was some disagreement on this point, with some folks saying that mipmaps should be used, with a max filter for generation. I was a bit puzzled by this, as it’s something that isn’t widely done (as far as I know). It means spending performance to generate the mips, and memory to hold them, and it would cause artifacts since shadow boundaries would move as the mip level drops.

Since I finally own a DX11-capable GPU, I decided to test this out. I’d been thinking about putting together a DX11 sandbox codebase for a while, and this seemed like as good a motivation as any. While all of this was done on my machine and time, since this is so similar to what I do during my day job, I am not going to post code (at least not publicly).

Quality

Here is a 4K shadow map with bilinear comparison filtering and no mips:

4k_hard_linear_crop

And the same with trilinear:

4k_hard_trilinear_crop

As I suspected, the shadow boundaries move as one goes down the mip chain. It may not look particularly jarring in the screenshot, but let me assure you that’s it’s extremely nasty when the light is moving, much more jarring than the pixel-level aliasing that we get without mips. In fairness, it is only this bad because I purposefully chose the worst angle I could find, but that’s the one that rendering engineers worry about.

I was about to write the idea off, but then the voices in my head suggested I try 16x aniso:

4k_hard_aniso_crop

Much better. Aniso minimizes the artifacts, and it helps clear up some nasty aliasing that occurs elsewhere in the image. This is not your run of the mill projection “aliasing”. It’s the more insidious frequency variety.

Bilinear:

4k_hard_linear_crop2

Aniso:

4k_hard_aniso_crop2

It’s clear then, that mip-mapping brings a quality benefit for simple hard shadow mapping, but only if high quality filtering is used. Skimp on the filtering, and you risk artifacts that can make it look worse than bilinear.

Performance

Here are the timings for a test shader that does lighting, texturing, and one shadowmap tap. Measurements taken on my Haswell i3-4010U at 1366×755 pixels, by staring at a 32-frame moving average and eyeballing its approximate value. They’re approximate, so don’t go scrutinizing them too much.

Shadowmap is a 16-bit depth texture generated with depth-only rendering. Mip generation for the shadowmap was done by rendering quads using the following shader:

Texture2D tZ;
sampler sPoint;
 
float main( float2 uv : TEXCOORD0 ) : SV_Depth
{
    float4 v = tZ.Gather( sPoint, uv );
    return max( max(v.x,v.z), max(v.y,v.w) );
}

Only the upper mips are generated since the bottom ones don’t really matter and are trickier to generate efficiently.

4096 shadow map, 4 mips

filter frame time(ms) frame time(ms) w/o mipgen
bilinear 7 n/a
trilinear 10 6.6
aniso 10.7 7.4

If mipgen were free, we’d get a slight performance boost from better cache efficiency, but in the common case of re-rendering the shadow map all the time, performance suffers.

If we drop the resolution, it’s less jarring.

1024 shadow map, 1 mip

filter frame time(ms) frame time(ms) w/o mipgen
bilinear 2 n/a
trilinear 2.1 2
aniso 2.4 2.1

The mip generation is now much cheaper, of course, but the cache efficiency benefits no longer apply, and we take a slight hit because aniso filtering takes longer.

PCF In Practice

Our discussion to this point has ignored the important fact that almost nobody really wants crisp, hard shadows. If we did, then I reckon stencil shadows probably would not have fallen out of favor as quickly as they did. What we really want is some approximation to soft shadows, because real shadows are always somewhat soft. As I wrote in an earlier post, point lights do not exist.

“PCF” as used in the realtime rendering community, tends to refer not to simple hardware comparison filtering, but rather to the large dynamic blurs in which comparison filtering is applied.

There are two common approaches to “PCF”:

Random rotation:

A disk shaped filter kernel is placed around the sample location and randomly rotated, either per screen pixel, or per light pixel, depending on what sorts of artifacts you’re more tolerant of. One advantage of this one is that the kernel can be dynamically scaled to mimic a physical penumbra.

Bilinear PCF:

A regular grid of bilinear PCF samples is taken at fixed offsets from the sample location, and the results are combined using a box filter. A naive implementation simply uses numerous bilinear taps, but this can be optimized considerably by using Gather instructions, and combining samples within the shader. The seperability of the box filter allows this to be done very cheaply. Here is a diagram I ripped from Holger Gruen’s 2009 GDC presentation, showing 3×3 bilinear PCF on a 4×4 texel block. The left one shows where to place the taps for an optimized gather4 implementation.

holger

There is another, very important reason for softening shadows. Even with hardware PCF, hard shadow maps require a very high resolution in order to not look terrible. The extra performance and memory footprint of a larger shadow map is a cost we’re not always willing to pay, and our feeble attempts to approximate soft shadows also allow us to get away with a much lower resolution by blurring away the jaggies.

Here is a 1024 hard shadow map WITH a bilinear comparison tap (as you can see, it doesn’t help much):

1k_hard_crop

Here is a 1024 map with 5×5 bilinear PCF:

1k_5x5_crop

Our shadows are blurrier, the jaggies are hidden, and while we take a performance hit, it isn’t too severe (see below for numbers).

By the way, if you’ve never seen John Isidoro’s 2006 shadow mapping presentation, I’d suggest perusing it. In my opinion it’s a very important piece of history. Both of the modern techniques I’ve mentioned here are anticipated in John’s work, which took place at the end of the DX9 era. John’s edge tap smoothing is basically the same thing as optimized bilinear PCF (though he didn’t call it that).

Soft Shadow Performance

Here are the numbers for an optimized 5×5 bilinear kernel (16 gather4 taps). There is a modest cost compared to a single tap:

map size frame time(ms)
4096 8.5
1024 3

If we want mips, we can’t use the gather4 tricks, so we have to fall back to a ‘naive’ kernel (25 taps). The timings on that are:

map size filter frame time(ms) frame time w/o mipgen
4096 bilinear 11.2 n/a
4096 trilinear 14.6 11.2
4096 4x aniso 20.85 17.75
4096 16x aniso 28.3 25.2
1024 bilinear 5.2 n/a
1024 trilinear 5.6 5.3
1024 4x aniso 8.1 7.9
1024 16x aniso 11.3 11.1

The performance delta here is a lot higher, and unlike before, mip generation is no longer the only problem. Simply having mips on our texture tosses our performance off a cliff. The trouble is that we’re now doing lots of taps, and we’ve taken our cheap bilinear taps and turned them into expensive anisotropic taps. We could go back to trilinear and break even, but this buys us nothing, because the artifacts are back, and we’re slower than we could be with the gather4 option.

The quality benefit of mips still applies, however. Eventually a wide PCF kernel will start to alias, and using mip-mapped taps with aniso eliminates the aliasing, if we’re willing to pay the performance cost. However, this aliasing is less likely to happen with a wide PCF kernel. At 1K, in my test scene, it is not visible.

Ironically, we can also choose to remove the aliasing by simply cutting the shadow map resolution, which we might prefer, because it gives us fuzzier shadow edges.

Conclusions

There are good reasons not to use mip-maps with shadow maps:

  • Performance
  • Memory Footprint
  • Risk of artifacts

Acceptable quality can be achieved without mips.

There are also good reasons to use mip-maps with shadow maps:

  • Alias prevention
  • Filterable shadowmap variants (VSM/CSV/ESM)
  • As an optimization aid (e.g. min-max mips for penumbra detection)

Full Images

1024 hard bilinear:

1k_hard

1024 hard trilinear:

1k_hard_trilinear

1024 hard 16x aniso:

1k_hard_16xaniso

4096 hard bilinear

4k_hard_linear

4096 hard trilinear:

4k_hard_trilinear

4096 hard 16x aniso:

4k_hard_aniso

1024 5×5 bilnear:

1k_5x5

1024 5×5 trilinear:

1k_5x5_trilinear

1024 5×5 16x aniso:

1k_5x5_aniso

4096 5×5 bilinear:

4k_5x5_bilinear

4096 5×5 trilinear:

4k_5x5_trilinear

4096 5×5 16x aniso:

4k_5x5_aniso

One Comment

  1. Santokes

    I was about to do this mips experiment! Thanks for writing this and posting your performance numbers.

    Great stuff Josh

Comments are closed.