An introduction to Temporal Anti-Aliasing with Babylon.js

Babylon.js
8 min readFeb 1, 2024

Aliasing is one of the most annoying problems when working with computer-generated images. It can spoil your experience, even when using the latest 3D rendering techniques. Aliasing appears in the form of jagged lines (geometric aliasing) or bright spots (specular/shading aliasing) and gets worse with movement.

Geometric aliasing (look at all the mostly horizontal lines)
Shading aliasing (look at the bright spots around the buttons)

Aliasing exists because of the discrete nature of the screen (composed of pixels) and because each pixel can only be one color. Each pixel is effectively a small window on a part of the scene and should be able to display all the colors visible from that window to accurately describe the scene, but as it can only be one color, we have to make a decision about which color to display. There are many algorithms that deal with this problem, and Temporal Anti-Aliasing (TAA) is one of them. We’re going to implement a basic version of the algorithm (only for static scenes), because, as you’ll see, a full implementation can be quite complex and is outside the scope of this blog post.

Temporal Anti-Aliasing general principles

The main idea is quite simple: render the scene several times, each time with a different (small) offset for the projection matrix so that the rendered scene is slightly offset within the one-pixel limit. By averaging all the renderings, you’ll obtain a pretty good anti-aliasing effect.

Jittering the camera — from Unigine documentation

However, rendering the scene several times per frame would seriously affect performance. This is where the “temporal” part of TAA comes in: we render the scene once per frame (with a different projection offset each time), and average the result over several consecutive frames. In this way, the cost of calculation is spread over several frames. It will take some time to obtain a smoothed image (8 frames, for example, if we choose 8 different offsets), but if we work at 60 frames per second, it won’t be too noticeable. Let’s see how this works in practice.

Camera offset

The camera needs to be shifted slightly so that the scene is slightly different from one frame to the next, but still within the dimensions of one pixel.

So, after projection onto the 2D screen, we want to stay within the boundaries of one pixel. The projection matrix for a perspective camera is as follows:

Projection matrix for a perspective camera

Let’s project a 3D point (in view coordinate space):

Projected point (result is in clip space)

What we want to do is apply an offset (dx, dy) to the 2D coordinates (ax, by), i.e. something like (ax+dx, by+dy). Let’s add dx and dy to the projection matrix:

Offset added to the projection matrix

We have an extra z factor on dx and dy, but don’t forget that the GPU will perform a division by the w coordinate as part of the normal rendering pipeline, to convert clip space to NDC (Normalized Device Coordinate) space! As you can see above, w=z (w is the 4th coordinate of the projected vector), so the z factor will cancel out when we divide by w and we’ll get what we want, a (dx,dy) shift of the 2D coordinates:

Conversion from clip to NDC space performed by the GPU

Note that the coordinates to which (dx,dy) apply are in NDC space, which is in the interval [-1,1] for x/y. So, if dx and dy are values in the interval [0,1], they must be multiplied by (2/width, 2/height), where width and height are the dimensions of the output (usually the screen), if the offsets are not to exceed the one-pixel limit.

On the code side, it’s easy enough: just define the offsets in the projection matrix as follows:

Defines offsets within the projection matrix

In the real code, we’ll apply a translation (-0.5, -0.5) to (dx,dy) because the 2D pixel coordinate is the pixel center coordinate.

Generating the offsets

We could use random values for the offsets, but we can get better results by using a low-discrepancy sequence. I won’t go into the details; you’ll find plenty of documentation on the subject on the Internet if you’re interested. The generally accepted method for TAA is to use a Halton sequence. The code to generate this sequence can also be easily found, and you’ll find a small class in the playground below that generates suitable numbers for use with our TAA implementation.

As an illustration, here are the locations of 8 points generated with a Halton sequence created from the starting numbers (2,3):

From High Quality Temporal Supersampling by Brian Karis (Unreal Engine)

Note that when we’ve used all the points from the Halton sequence for our projection offsets, we restart from the first point.

Here’s a playground which uses a Halton sequence of length 16 (you can change it line 30) to jitter the camera each frame. As we don’t average the results yet, you will see flickering (most visible around the buttons and the antenna):

https://playground.babylonjs.com/#539X0P#19

Averaging the frames

What we could do is accumulate (“additive blending”) each frame into a render target, then divide by the number of frames to get the final average. To avoid division, we could apply a weight to the color before writing it to the output. For example, if we use a sequence of 8 Halton samples, we’d use a weight of 1/8 for each frame. However, this method is a bit cumbersome, we’d have to clear the render target before restarting the Halton sequence, and it’s not well suited to handling motion (something we won’t be doing in this simple blog post, however). In addition, blending is more demanding in terms of performance, as it requires the source target to be read during the process.

Instead, we perform a moving average using this formula (it can be shown that this formula converges to the mean of x(t) values when α is small — see slides 15 and 16 of High Quality Temporal Supersampling) :

s(t-1) is the current average of a given pixel, x(t) is the new color of the pixel in the current frame, and s(t) is the new average. α is a small number, to be chosen by the user. A larger number will make the formula converge faster, but towards a number that will deviate more from the true mean than if a smaller value is chosen, but in this case, it will converge more slowly… Generally, a value in the [0.05, 0.1] range will give good results.

We’ll use a post-process to implement this averaging, and a ping-pong of two render targets: one render target will be used as the source (s(t-1)) and the other will store the result of the calculation for the current frame (s(t)). The render targets will be exchanged at the end of the calculation in order to swap their roles for the next frame.

Here’s a straightforward implementation of the algorithm: https://playground.babylonjs.com/#539X0P#20

The “Enable MSAA” checkbox enables/disables MSAA. You can see that MSAA doesn’t help with shading aliasing: look around the buttons and antenna, and you’ll see that aliasing remains even with MSAA enabled. MSAA is only useful for edges (look at the handle). Now, if you activate PTAA, you’ll see a marked improvement! You’ll also find that MSAA makes no difference when PTAA is enabled, so it’s best to disable it to improve performance.

Note that I have used an additional “pass” post-process, which simply copies the render target s(t) to the screen. I’ve also used the name “PTAA” instead of “TAA” (P for “Partial”), as this isn’t a full implementation of TAA, just the easy-to-implement part!

Shortcomings

Why is this the easiest part? Try moving/rotating the camera (with PTAA activated) and see what happens… You’ll get some heavy ghosting:

This is because in the formula :

s(t-1) should not be the accumulation for the pixel we’re currently processing, but rather we should find out where this pixel was located in the previous image to obtain the accumulation for this pixel! And therein lies the complexity. If you want more details, here are a few references (among many others!):

As you can see from these references (if you read them!), correcting these artifacts is well beyond the scope of this blog post!

What you can do to reduce ghosting is to increase the value of the “PTAA factor”. But at some point, you’ll start to see flicker, so you can’t increase it too much.

We could also simply disable PTAA when the camera is in motion. We won’t have anti-aliasing during that time, but at least the ghosting will be suppressed:

https://playground.babylonjs.com/#539X0P#22

Conclusion

Temporal Anti-Aliasing may no longer be the most successful anti-aliasing technique, but it’s still a very good one.

Even though we’ve only scratched the surface, it may be interesting to know how it works and how it can be improved.

Even in the simple form presented here, you could use it to enhance the rendering of still images (for screenshots, for example).

Courtesy of GIPHY

Popov — Babylon.js Team

--

--

Babylon.js

Babylon.js: Powerful, Beautiful, Simple, Open — Web-Based 3D At Its Best. https://www.babylonjs.com/