# World Space Coordinates of Screen Fragments 2: Manchester Prep

I’ve been inspired by the fact that a very nice person commented on my previous post about getting the world-space coordinates of the near plane of the view frustum to revisit this topic, because it’s been bugging me that my previous technique required a matrix multiplication and that feels like it might be more expensive than strictly needed. As I discussed before, you might want to know where exactly in your world the screen is so that if it intersects with something, you can treat that part of the screen differently – for example if your camera is partially submerged in water, you might want to apply a fogging and distortion effect to those fragments below the surface, but not above it.

The first thing to understand is how your camera is oriented. For historical reasons, and because of the way that my world axes are set up (+x is east, +y is north, +z is down; the camera’s neutral direction is looking north down the y axis.), the camera orients itself in world space by rotating around the z axis to look left-right, and around the x axis to look up-down. Just to make things more confusing, because the world moves around the camera in OGL, remember that in your shaders, the camera’s coordinates are negative (i.e. your shaders think your camera is at (-cameraX, -cameraY, -cameraZ). You can cut through a lot of confusion by using a system like gluLookAt() to orient your camera, which confers a huge bonus in that it returns both the direction in which the camera is facing and also the camera’s direction for “up”, which will be very handy.

The first step is to work out where the camera is and which direction it’s looking. In my case, I keep track of the camera’s position as (cameraX, cameraY, cameraZ), and rotation around Z and X in radians (i.e. pi radians is 180 degrees). My camera matrix rotates the camera around the Z axis and then around its own X axis, and then translates to its location in world space. Using this system, the camera’s unit vector is worked out like this:

# More videos – DoF, sun shafts, water caustics, pixel-accurate water surfaces.

Here’s a bunch of new videos. Take a look at:

2) More of the same, but at sunrise:

3) Probably the most technically interesting clip, and a rumination on whether the trees were a mistake and we should never have left the oceans. This one demonstrates per pixel water effects (i.e. only affects parts of the screen below the surface of the water) and also shadow mapping the water surface to give underwater light shafts and caustics:

(Note also: the clouds are rendered as billboarded impostors, ray traced in the fragment shader to give perfect spheres. As you’ve probably noticed they are a bit rubbish at the moment.)

# Dynamic depth of field on the GPU – part 3 of n

As of the end of part 2 (and using information from part 1, which you should totally read), it’s time to implement the actual blurring effect. This uses code that I acquired somewhere on the internet, but I can’t remember the exact attribution; so, if I got it off you, please let me know!

What you will need is:

• a framebuffer object containing your final render as a texture
• the texture we made in part 2, which contains each fragment’s difference in linear depth from the calculated focal length
• two framebuffer objects each of which is the same size as your main framebuffer
• (you can cut this down to one with a bit of clever fiddling
• a shader which does a gaussian blur, which I’m going to explain

What we’re going to do is blur the image according to the values in the depth texture. Because fragment shaders generally like to do everything backwards, the way to do this is generate the blurred image, and then blend it with the original image, so that bigger differences in the depth texture give more of the blurred image.

Okay, here’s the implementation details:

# Dynamic depth of field on the GPU – Part 2 of n

Thus far, we’ve discussed the reasons for wanting to do depth of field (DoF) on the GPU. We’ve figured out that we need to get some idea of what our focal length should be, and to figure out by how much the depth of each fragment in our scene render differs from that focal length. All of this information is available in the depth buffer we generated when we rendered the scene’s geometry; if we rendered into a framebuffer object with a depth attachment, this means we have the information available on the GPU in the form of a texture.

Getting data back off the GPU is a pain. It can be accomplished effectively with pixel buffer objects, but the available bandwidth between CPU and GPU is comparatively tiny and we don’t really want to take any up if we can help it. Additionally, we don’t want either the CPU or the GPU stalled while waiting for each other’s contribution, because that’s inefficient. It’s therefore more logical to analyse the depth buffer on the GPU using a shader. As a bonus, you can do this at the same time as you’re linearising the depth buffer for other post-processing operations – for example, you might need a linear depth buffer to add fog to your scene or for SSAO.

What we’re going to do is generate a representation of how each fragment’s depth differs from the focal length, the results of which will look something like this:

Here’s how to do it:
Continue reading Dynamic depth of field on the GPU – Part 2 of n

# Dynamic depth of field on the GPU – Part 1 of n

Modern 3D games use a bunch of tricks to convince our brains that we are viewing their world through some bizarre hybrid sense organ which consists of about 30% human eye and 70% movie camera. Hence we get lens flares, aperture changes and other movie staples which aren’t exactly true to life; we accept these effects probably because a) we are all so highly mediated these days that we expect the things which appear on our TVs/monitors to look like that and b) because they make shiny lights dance around the screen, and us primates love that stuff.

(An aside; anybody who wears glasses is totally used to lens flares, bloom lighting and film grain effects in everyday life, which is probably another reason why us nerds are so accepting of seeing the world as a movie. These settings can be temporarily toggled off with the use of a small amount of detergent and a soft cloth, but tend to return to the defaults over time).

What the human eye does have in spades, though, is dynamic depth of field. Anything outside of the centre of the field of view is out of focus and therefore appears blurred (and also in black and white, but let’s pretend we don’t know that). Humans generally focus on the thing in the centre of their visual field, even when the thing they are actually attending to isn’t (hence when you watch something out of the corner of your eye, it’s still blurry). Because depth of field effects weren’t at all viable on early graphics hardware, a lot of people have got used to everything in a scene having the same sharpness and dislike the addition of depth of field. However, used tastefully, it can nicely work as a framing effect; in addition it’s pretty handy to hide lower-resolution assets in the background.

The technique I am going to explain here has a major advantage for my purposes; the whole thing can be done as a post-process on the GPU, meaning that you don’t have to fiddle around with scene graphs or reading your depth buffer back for calculations on the CPU.

# On GLSL subroutines in Mac OS X and disappointment

I probably don’t need to tell anybody that the state of 3D graphics is somewhat sad under OS X when compared with Windows. This isn’t really due to differences between the OpenGL API used by MacOS and DirectX, as used by Windows; even OpenGL-only Windows applications typically run significantly better than their MacOS counterparts. It’s more easily attributable to two other factors:

1. There are far more computers running Windows than MacOS; these computers are more likely to be running intensive 3D applications (i.e. video games)
2. Apple is notoriously disinterested in games, and notoriously tardy in keeping up with new OpenGL specifications.

This means that a) it takes a while for any new OpenGL features to make it to the Mac platform and b) they suck when they finally arrive.

As of Mavericks, the OpenGL Core profile has been upgraded to 4.0, and GLSL version 400 is supported. This means that shader subroutines have become available. Theoretically, shader subroutines are a really neat feature. Because of the way that graphics cards work, conditionals and branching in shader code incur a large performance penalty. Similarly, dynamic loops are much less efficient than loops of a fixed length, even though more recent graphics APIs claim to have fixed that. What this means is that if you have a shader which choses whether to do a cheap operation, or a more expensive operation, then it will perform worse then either (usually because it’s doing both). If that shader then choses how many times to do the expensive operation, the performance gets even worse despite the fact that it should theoretically be avoiding unnecessary iterations through the loop. This means that the best option has always been to write two different shaders for the simple and the complex operation, and not to bother dynamically limiting the number of iterations in a loop, but just hard code the smallest number you think you can get away with.

Shader subroutines were supposed to fix the first of these problems; it was supposed to be possible to write “modular” shaders, where a uniform allows you to change which operations a shader uses. In my renderer, which is admittedly poorly optimised, I would like to chose between a parallax-mapped, self-shadowing shader (expensive) or a simpler version which uses vertex normals – in this specific case, the simpler version is for drawing reflections, which don’t require the same level of detail. Here’s the results (Nvidia GTX680, MacOS X 10.9.4, similar results using both default Apple drivers and Nvidia web drivers):

• No subroutines – both main render and reflection render use expensive shader: frame time approx. 0.028s
• Subroutines coded in the shader, and uniforms set, but subroutines never actually called: frame time approx. 0.031s
• Subroutines in use, cheaper subroutine used for drawing reflections: frame time approx. 0.035s

Vertex normals should be really cheap, and save a lot of performance when compared with parallax mapping everything. In addition, I’m not mixing and matching different subroutines – one subroutine is used for each pass, so switching only occurs once per frame. The problem is, the mere existence of code indicating the presence of a subroutine incurs a significant performance hit, which is actually more expensive than just giving up and using the much more complicated and expensive shader for everything.

So, yeah; disappointing.

# Sunlight volumes and scattering

(NB. as with much of my stuff, every part of this that I wrote myself is a dirty hack.)

Light scattering (“god rays”) is a beautiful effect; in fact it’s so beautiful, it’s one of the rare bits of eye-candy that everybody bitches about (OMG so much GRAPHICS) but everybody also secretly loves. Good examples of the technique performed as a post-process can be found here or here.

Here’s what it looks like in nature:

And here’s Crytek’s approach:

The implementation above is performed as a sort of radial blur outward from the screen-space position of the light source, masked by a representation of any objects occluding the light – trees, landscape, buildings, character models, etc. Apart from the difficult concept that this process is all backwards in shader language (because you can only influence the fragment you’re currently drawing, you’re marching from the point on screen towards the light source, not the other way round), this is pretty easy to implement. There are a couple of downsides, one of which is very minor and the other of which starts to get on the nerves:

1) This isn’t even slightly true-to-life in terms of physical parameters – it’s an “art” effect, and you tweak it until it looks good. The results won’t be affected by atmospheric conditions, such as fog. This is the very minor downside.

2) This effect only works when the light source is on the screen. You have to fade it out whenever the light source isn’t within the camera’s field of view, or you get “opposite” light shafts such that, for example, the sun is suddenly now setting in the east. In addition, you can’t have any light shafts entering the frame from the side – so, if you look down at your feet, the shafts are suddenly gone. This is the major bugbear.

In order to do a proper “light shafts” effect, then, we need to know where in our scene light can get to, and how much of that light can make it to the camera. Fortunately, the first question can be answered easily if we’re set up to cast shadows from the main light – the shadowmap contains the information needed. Unfortunately, answer to the second question is much more complicated than it sounds. To get round this problem, we’re going to need to find a way to integrate all of the light being scattered in along a ray from the camera to each visible point.

Yes, folks, we’re going to need to write a ray tracer. It’s OK though, we don’t actually need to write a good one.

# Orientation Matrices for Cube Mapping

I may well be totally wrong, but I don’t think I’ve ever successfully Googled a useful set of orientation matrices for cube mapping. As you already know, a cube map is a set of six images which, when projected on to a cube, provide a decent simulation of a spherical texture map. Two of the most common modes of usage are to provide real-time reflections on the outside of an object (by repeatedly making a map of the environment surrounding that object and then projecting that map back on to the object, as in the reflections you see on e.g. the cars in racing games), and to provide a skybox.

Skyboxes are usually either pre-rendered (pretty but boring), or done through rendering atmospheric scattering for your scene and then projecting some celestial bodies like the moon, stars etc (pretty but computationally intensive). An additional bonus of drawing your own skyboxes is that you can then use them for doing environment/ambient lighting for objects in your scene, either by working out the spherical harmonics (neat but I’m far too dumb to have ever wrapped my head around it) or by techniques which involve downsampling the cube map. This gives you ambient light which changes colour depending on the angle of the sun basically for free.

Therefore cube maps have multiple advantages for rendering your skybox and lighting:
1) render once and reuse (you can make this once per frame, or less often depending on how dynamic the sun is.)
2) you can do atmospheric scattering at a surprisingly low resolution and still have a decent looking result. You basically have to do your atmospheric scattering in the fragment shader if you want to use a “spotlight” effect to render the sun, which gets very expensive in terms of fragment power. I actually do both the sun and the moon, which is even more expensive, so lowering resolution is a major speed-up here.
3) basically free specular and ambient environment mapping of the sky on to everything in your scene. You can either go the very expensive route for downsampling, or just mipmap the thing and get 90% of the quality for 10% of the effort, and hardware acceleration.
4) if you’re blending your scene into the sky for a distance fogging effect – well, you just got the source for that as well!

This is where you usually run into a brick wall because figuring out the correct orientation matrices for rendering the cube map is a pain in the backside. What you’re going to be doing in the end is rendering a box around your camera and texture mapping the cube map on to the inside of the box, which will then act as the skybox. You can simplify this by not applying any rotation to the skybox, so that it’s aligned with the x, y and z axes. Therefore what you need to do is figure out how to make the camera look in six directions: +x, -x, +y, -y, +z and -z. You could do this with gluLookAt, but that’s a whole heck of a lot of lines of code just to look in the direction of an axis. Better to just know what matrices to use: see below. (I’m weird and use +x = east, +y = north, -z = up i.e. inverted right-handed axes.)

# Getting world-space coordinates of screen fragments in glsl

So, you’re probably asking yourself why on earth you would even want to do that? Well, it’s useful information if you’ve got a camera which can intersect with things in your scene. The most obvious example here is water – say, for example, you wanted to be able to have a distort effect and reduced fogging distance to make the underwater part of your environment visually distinct from the part above water:

You may have noticed that in e.g. elder scrolls games you can trick the camera into behaving as if it’s not underwater when you’re near to the air-water interface. This is presumably because they’ve just set a camera height which denotes “underwater”, but what if the player has positioned the camera so that half of the screen is underwater and half is above? Hence you need an approach which works per-fragment.

How do you work out which part of your screen is actually underwater? You generally can’t do it when you’re rendering your water’s surface, because you’re not going to be shading any pixels which aren’t directly at the air-water interface. If you’re under the surface of the water and looking down, this approach will immediately fail. What you are going to want to do is find out a way of masking off bits of your screen that are underwater and using them to do your “underwater” effects.

# Fast Enumeration, Massive Inefficiency

Cocoa makes use of a system called Fast Enumeration, which allows you to write array loops in a compact fashion and is also supposed to speed things up by getting [myArray nextObject] rather than getting [myArray objectAtIndex:nextIndex] on each pass through the loop. Here’s an example:

```for (NSObject *anObject in myArrayOfObjects) { //do stuff }```

Because I’m an idiot, I had been using Fast Enumeration in situations where I needed to know the index of each object because I couldn’t be bothered to write out the code for incrementing an index. This resulted in incredibly readable stuff like the excrescence below:

```for (NSObject *anObject in myArrayOfObjects) { if ([myArrayOfObjects indexOfObject:anObject] < ([myArrayOfObjects count] - 1) { NSObject *theNextObject = [myArrayOfObjects objectAtIndex:([myArrayOfObjects indexOfObject:anObject] + 1)]; [self doSomethingWithObject:anObject andNextObject:theNextObject]; } }```

In case your eyes are bleeding too much for you to be able to see how hard I’ve made things for myself, here’s a rundown of what the above code actually does:
Continue reading Fast Enumeration, Massive Inefficiency