In my last post about pixel quad utilization, I stated that shadow maps don’t need mips. There was some disagreement on this point, with some folks saying that mipmaps should be used, with a max filter for generation. I was a bit puzzled by this, as it’s something that isn’t widely done (as far as I know). It means spending performance to generate the mips, and memory to hold them, and it would cause artifacts since shadow boundaries would move as the mip level drops.
Since I finally own a DX11-capable GPU, I decided to test this out. I’d been thinking about putting together a DX11 sandbox codebase for a while, and this seemed like as good a motivation as any. While all of this was done on my machine and time, since this is so similar to what I do during my day job, I am not going to post code (at least not publicly).
Quality
Here is a 4K shadow map with bilinear comparison filtering and no mips:
And the same with trilinear:
As I suspected, the shadow boundaries move as one goes down the mip chain. It may not look particularly jarring in the screenshot, but let me assure you that’s it’s extremely nasty when the light is moving, much more jarring than the pixel-level aliasing that we get without mips. In fairness, it is only this bad because I purposefully chose the worst angle I could find, but that’s the one that rendering engineers worry about.
I was about to write the idea off, but then the voices in my head suggested I try 16x aniso:
Much better. Aniso minimizes the artifacts, and it helps clear up some nasty aliasing that occurs elsewhere in the image. This is not your run of the mill projection “aliasing”. It’s the more insidious frequency variety.
Bilinear:
Aniso:
It’s clear then, that mip-mapping brings a quality benefit for simple hard shadow mapping, but only if high quality filtering is used. Skimp on the filtering, and you risk artifacts that can make it look worse than bilinear.
Performance
Here are the timings for a test shader that does lighting, texturing, and one shadowmap tap. Measurements taken on my Haswell i3-4010U at 1366×755 pixels, by staring at a 32-frame moving average and eyeballing its approximate value. They’re approximate, so don’t go scrutinizing them too much.
Shadowmap is a 16-bit depth texture generated with depth-only rendering. Mip generation for the shadowmap was done by rendering quads using the following shader:
Texture2D tZ; sampler sPoint; float main( float2 uv : TEXCOORD0 ) : SV_Depth { float4 v = tZ.Gather( sPoint, uv ); return max( max(v.x,v.z), max(v.y,v.w) ); } |
Only the upper mips are generated since the bottom ones don’t really matter and are trickier to generate efficiently.
4096 shadow map, 4 mips
| filter | frame time(ms) | frame time(ms) w/o mipgen |
|---|---|---|
| bilinear | 7 | n/a |
| trilinear | 10 | 6.6 |
| aniso | 10.7 | 7.4 |
If mipgen were free, we’d get a slight performance boost from better cache efficiency, but in the common case of re-rendering the shadow map all the time, performance suffers.
If we drop the resolution, it’s less jarring.
1024 shadow map, 1 mip
| filter | frame time(ms) | frame time(ms) w/o mipgen |
|---|---|---|
| bilinear | 2 | n/a |
| trilinear | 2.1 | 2 |
| aniso | 2.4 | 2.1 |
The mip generation is now much cheaper, of course, but the cache efficiency benefits no longer apply, and we take a slight hit because aniso filtering takes longer.
PCF In Practice
Our discussion to this point has ignored the important fact that almost nobody really wants crisp, hard shadows. If we did, then I reckon stencil shadows probably would not have fallen out of favor as quickly as they did. What we really want is some approximation to soft shadows, because real shadows are always somewhat soft. As I wrote in an earlier post, point lights do not exist.
“PCF” as used in the realtime rendering community, tends to refer not to simple hardware comparison filtering, but rather to the large dynamic blurs in which comparison filtering is applied.
There are two common approaches to “PCF”:
Random rotation:
A disk shaped filter kernel is placed around the sample location and randomly rotated, either per screen pixel, or per light pixel, depending on what sorts of artifacts you’re more tolerant of. One advantage of this one is that the kernel can be dynamically scaled to mimic a physical penumbra.
Bilinear PCF:
A regular grid of bilinear PCF samples is taken at fixed offsets from the sample location, and the results are combined using a box filter. A naive implementation simply uses numerous bilinear taps, but this can be optimized considerably by using Gather instructions, and combining samples within the shader. The seperability of the box filter allows this to be done very cheaply. Here is a diagram I ripped from Holger Gruen’s 2009 GDC presentation, showing 3×3 bilinear PCF on a 4×4 texel block. The left one shows where to place the taps for an optimized gather4 implementation.
There is another, very important reason for softening shadows. Even with hardware PCF, hard shadow maps require a very high resolution in order to not look terrible. The extra performance and memory footprint of a larger shadow map is a cost we’re not always willing to pay, and our feeble attempts to approximate soft shadows also allow us to get away with a much lower resolution by blurring away the jaggies.
Here is a 1024 hard shadow map WITH a bilinear comparison tap (as you can see, it doesn’t help much):
Here is a 1024 map with 5×5 bilinear PCF:
Our shadows are blurrier, the jaggies are hidden, and while we take a performance hit, it isn’t too severe (see below for numbers).
By the way, if you’ve never seen John Isidoro’s 2006 shadow mapping presentation, I’d suggest perusing it. In my opinion it’s a very important piece of history. Both of the modern techniques I’ve mentioned here are anticipated in John’s work, which took place at the end of the DX9 era. John’s edge tap smoothing is basically the same thing as optimized bilinear PCF (though he didn’t call it that).
Soft Shadow Performance
Here are the numbers for an optimized 5×5 bilinear kernel (16 gather4 taps). There is a modest cost compared to a single tap:
| map size | frame time(ms) |
|---|---|
| 4096 | 8.5 |
| 1024 | 3 |
If we want mips, we can’t use the gather4 tricks, so we have to fall back to a ‘naive’ kernel (25 taps). The timings on that are:
| map size | filter | frame time(ms) | frame time w/o mipgen |
|---|---|---|---|
| 4096 | bilinear | 11.2 | n/a |
| 4096 | trilinear | 14.6 | 11.2 |
| 4096 | 4x aniso | 20.85 | 17.75 |
| 4096 | 16x aniso | 28.3 | 25.2 |
| 1024 | bilinear | 5.2 | n/a |
| 1024 | trilinear | 5.6 | 5.3 |
| 1024 | 4x aniso | 8.1 | 7.9 |
| 1024 | 16x aniso | 11.3 | 11.1 |
The performance delta here is a lot higher, and unlike before, mip generation is no longer the only problem. Simply having mips on our texture tosses our performance off a cliff. The trouble is that we’re now doing lots of taps, and we’ve taken our cheap bilinear taps and turned them into expensive anisotropic taps. We could go back to trilinear and break even, but this buys us nothing, because the artifacts are back, and we’re slower than we could be with the gather4 option.
The quality benefit of mips still applies, however. Eventually a wide PCF kernel will start to alias, and using mip-mapped taps with aniso eliminates the aliasing, if we’re willing to pay the performance cost. However, this aliasing is less likely to happen with a wide PCF kernel. At 1K, in my test scene, it is not visible.
Ironically, we can also choose to remove the aliasing by simply cutting the shadow map resolution, which we might prefer, because it gives us fuzzier shadow edges.
Conclusions
There are good reasons not to use mip-maps with shadow maps:
- Performance
- Memory Footprint
- Risk of artifacts
Acceptable quality can be achieved without mips.
There are also good reasons to use mip-maps with shadow maps:
- Alias prevention
- Filterable shadowmap variants (VSM/CSV/ESM)
- As an optimization aid (e.g. min-max mips for penumbra detection)
Full Images
1024 hard bilinear:
1024 hard trilinear:
1024 hard 16x aniso:
4096 hard bilinear
4096 hard trilinear:
4096 hard 16x aniso:
1024 5×5 bilnear:
1024 5×5 trilinear:
1024 5×5 16x aniso:
4096 5×5 bilinear:
4096 5×5 trilinear:
4096 5×5 16x aniso:




















I was about to do this mips experiment! Thanks for writing this and posting your performance numbers.
Great stuff Josh