Monday, April 25, 2011

Going to California (with an Aching in My Heart)

Periodically people will try to sum up relative latencies for hardware, but I really like this article. In particular, putting memory distance in human terms helps give you a sense of the metaphorical groan your CPU must make every time it misses a cache.
  • L1 cache: it's on your desk, pick it up.
  • L2 cache: it's on the bookshelf in your office, get up out of the chair.
  • Main memory: it's on the shelf in your garage downstairs, might as well get a snack while you're down there.
  • Disk: it's in, um, California. Walk there. Walk back. Really.*
I had a pretty good idea that L2 misses were bad - when we profile X-Plane, some of the bottlenecks have tight correlation between L2 cache misses and total-time spent. And I knew disks were slow, but...not that slow.

If anything, that's a testimant to how good the operating systems are at hiding the disk drive from us most of the time.

The moral of the story: the disk can look a lot faster than it is, but only if you let it. Unfortunately, there is one aspect of X-Plane that fails miserably at this: the use of a gajillion tiny text file for scenery packages. The solution is simple: pack the files into one bigger file. This will let the OS pick up the (hopefully consecutive) single larger file and dump significant amounts of it into the page cache in one swoop without doing a million seeks. California is far away.

* The author's metaphor maps one cycle to one human second. That's the equivalent of 474. days for a 3 ghz CPU to take a 41 ms wait on a disk seek. You'd have to put up better than 12 miles a day to make it to California and back from the East coast. If you actually live out west, um, pretend you're an SSD.

Friday, April 22, 2011

So Many AA Techniques, So Little Time

This is a short summary of FSAA techniques, both for the art team, and so I don't forget what I've read when I come back to this in 9 months. (No promise on accuracy here, these are short summaries, often with a bit of hand-waving, and some of the newer post-processing techniques are only out in paper form now.)

Where does aliasing come from? It comes from decisions that are made "per-pixel", in particular (1) whether a pixel is inside or outside a triangle and (2) whether a pixel meets or fails the alpha test.

Texture filtering will not alias if the texture is mip-mapped; since the texel is pulled out by going "back" from a screen pixel to the texture, as long as we have mip-mapping, we get smooth linear interpolation. (See Texture AA below.)

Universal Techniques

Super-Sampled Anti-Aliasing (SSAA). The oldest trick in the book - I list it as universal because you can use it pretty much anywhere: forward or deferred rendering, it also anti-aliases alpha cutouts, and it gives you better texture sampling at high anisotropy too. Basically, you render the image at a higher resolution and down-sample with a filter when done. Sharp edges become anti-aliased as they are down-sized.

Of course, there's a reason why people don't use SSAA: it costs a fortune. Whatever your fill rate bill, it's 4x for even minimal SSAA.

Hardware FSAA Techniques

These techniques cover the entire frame-buffer and are implemented in hardware. You just ask the driver for them and go home happy - easy!

Multi-Sampled Anti-Aliasing (MSAA). This is what you typically have in hardware on a modern graphics card. The graphics card renders to a surface that is larger than the final image, but in shading each "cluster" of samples (that will end up in a single pixel on the final screen) the pixel shader is run only once. We save a ton of fill rate, but we still burn memory bandwidth.

This technique does not anti-alias any effects coming out of the shader, because the shader runs at 1x, so alpha cutouts are jagged. This is the most common way to run a forward-rendering game. MSAA does not work for a deferred renderer because lighting decisions are made after the MSAA is "resolved" (down-sized) to its final image size.

Coverage Sample Anti-Aliasing (CSAA). A further optimization on MSAA from NVidia. Besides running the shader at 1x and the framebuffer at 4x, the GPU's rasterizer is run at 16x. So while the depth buffer produces better anti-aliasing, the intermediate shades of blending produced are even better.

2-d Techniques

The above techniques can be thought of as "3-d" because (1) they all play nicely with the depth buffer, allowing hidden surface removal and (2) they all run during rasterization, so the smoothing is correctly done between different parts of a 3-d model. But if we don't need the depth buffer to work, we have other options.

Antialiased Primitives. You can ask OpenGL to anti-alias your primitives as you draw them; the only problem is that it doesn't work. Real anti-aliased primitives aren't required by the spec, and modern hardware doesn't support them.

Texture Anti-Aliasing. You can create the appearance of an anti-aliased edge by using a textured quad and buffering your texture with at least one pixel of transparent alpha. The sampling back into your texture from the screen is done at sub-pixel resolution and is blended bilinearly; the result will be that the 'apparent' edge of your rendering (e.g. where inside your quad the opaque -> alpha edge appears) will look anti-aliased. Note that you must be alpha blending, not alpha testing.

If you're working in 2-d I strongly recommend this technique; this is how a lot of X-Plane's instruments work. It's cheap, it's fast, the anti-aliasing is the highest quality you'll see, and it works on all hardware. Of course, the limit is that this isn't compatible with the Z buffer. If you haven't designed for this solution a retro-fit could be expensive.

Post-Processing Techniques

There are a few techniques that attempt to fix aliasing as a post-processing step. These techniques don't depend on what was drawn - they just "work". The disadvantages of these techniques are the processing time to run the filter iself (e.g. they can be quite complex and expensive) and (because they don't use any of the real primitive rendering information) the anti-aliasing can be a bit of a loose cannon.

Morphological Anti-Aliasing (MLAA) and Fast Approximate Anti-Aliasing (FXAA). These techniques analyze the image after rendering and attempt to identify and blur out stair-stepped patterns. ATI is providing an MLAA post-process as a driver option, which is interesting because it moves us back to the traditional game ecosystem where full screen anti-aliasing just works without developer input.

Edit: See also Directionally Localized Anti-Aliasing (DLAA).

(From a hardware standpoint, full screen anti-aliasing burns GPU cycles and sells more expensive cards, so ATI and NVidia don't want gamers to not have the option of FSAA. But most new games are deferred now, making MSAA useless. By putting MLAA in the driver, ATI gets back to burning GPU to improve quality, even if individual game developers don't write their own post-processing shader.)

It is not clear to me what the difference is between MLAA and FXAA - I haven't taken the time to look at both algorithms in detail. They appear to be similar in general approach at least.

Temporal Anti-Aliasing (TAA). This is a post process filter that blends the frame with the previous frame. Rather than have more samples on the screen (e.g. a 2x bigger screen in all dimensions for SSAA) we use the past frame as a second set of samples. The camera is moved less than one pixel between frames to ensure that we get different samples between frames. When blending pixels, we look for major movement and try to avoid blending with a sample that wasn't based on the same object. (In other words, if the camera moves quickly, we don't want ghosting.)

Deferred-Rendering Techniques

This set of techniques are post-processing filters that specifically use the 3-d information saved in the G-Buffer of a deferred renderer. The idea is that with a G-Buffer we can do a better job of deciding when to resample/blur.

Edge Detection and Blur. These techniques locate the edge of polygons by looking for discontinuities in the depth or normal vector of a scene, and then blur those pixels a bit to soften jaggies. This is one of the older techniques for anti-aliasing a deferred renderer - I first read about it in GPU Gems 2. The main advantage is that this technique is dirt cheap.

Sub-pixel Reconstruction Anti-Aliasing (SRAA). This new technique (published by NVidia) uses an MSAA G-Buffer to reconstruct coverage information. The G-Buffer is MSAA; you resolve it and then do a deferred pass at 1x (saving lighting) but then go back to the original 4x MSAA G-Buffer to edge detect.

I Love Surprises

Awesome quote from the Java Language Spec:
In the absence of explicit synchronization, an implementation is free to update the main memory in an order that may be surprising. Therefore the programmer who prefers to avoid surprises should use explicit synchronization.
As you know, the premier loves surprises.