World Space Coordinates of Screen Fragments 2: Manchester Prep

I’ve been inspired by the fact that a very nice person commented on my previous post about getting the world-space coordinates of the near plane of the view frustum to revisit this topic, because it’s been bugging me that my previous technique required a matrix multiplication and that feels like it might be more expensive than strictly needed. As I discussed before, you might want to know where exactly in your world the screen is so that if it intersects with something, you can treat that part of the screen differently – for example if your camera is partially submerged in water, you might want to apply a fogging and distortion effect to those fragments below the surface, but not above it.

The first thing to understand is how your camera is oriented. For historical reasons, and because of the way that my world axes are set up (+x is east, +y is north, +z is down; the camera’s neutral direction is looking north down the y axis.), the camera orients itself in world space by rotating around the z axis to look left-right, and around the x axis to look up-down. Just to make things more confusing, because the world moves around the camera in OGL, remember that in your shaders, the camera’s coordinates are negative (i.e. your shaders think your camera is at (-cameraX, -cameraY, -cameraZ). You can cut through a lot of confusion by using a system like gluLookAt() to orient your camera, which confers a huge bonus in that it returns both the direction in which the camera is facing and also the camera’s direction for “up”, which will be very handy.

The first step is to work out where the camera is and which direction it’s looking. In my case, I keep track of the camera’s position as (cameraX, cameraY, cameraZ), and rotation around Z and X in radians (i.e. pi radians is 180 degrees). My camera matrix rotates the camera around the Z axis and then around its own X axis, and then translates to its location in world space. Using this system, the camera’s unit vector is worked out like this:

Continue reading World Space Coordinates of Screen Fragments 2: Manchester Prep

How I Learned to Stop Blitting and Love the Framebuffer

A deferred rendering pipeline provides great opportunities to run post processing shaders on the results from your G-buffer (A G-buffer is the target for the first pass in a deferred renderer, usually consisting of colour, normal, depth ± material textures, which stores the information needed for subsequent lighting and effects passes). There are a couple of issues, however:

1) OpenGL can’t both read from and write to a texture at the same time; attempting to do so will give an undefined result (i.e. per OpenGL tradition it will look like the result you wanted, except when it doesn’t)
2) What, therefore, do you do about transparency in a deferred renderer? You need the information about what’s behind the transparent object, and how far away it is. That’s in your G-buffer, which you’re already writing to (I’m assuming here that you’re rendering transparent materials last, which is really the only sane way to do it).

What you’re going to need is a copy of a subset of your G-buffer to provide the information you need to draw the stuff behind your transparent material. There’s an expensive way and a cheap way to do this; the expensive way is to do all of your deferred lighting before you render any transparent materials (which does make life easier in some ways, but means you need to do two lighting passes, one for opaque and one for transparent materials), the cheap way is to decide that you’re not going to bother to light the stuff behind the transparency because you’re already going to be throwing a bunch of refraction effects on top anyway and all you really need is a bit of detail to sell the effect.

Here’s where I tripped myself up, in the usual manner for a novice learning OpenGL from 10-year-old tutorials on the internet; I naïvely thought that the most logical thing to do at this point would be to blit (i.e. copy the pixels directly) from my G-buffer to another set of textures which I would then use as the source for rendering transparency. Duplicating a chunk of memory seemed like it was going to be a much faster operation than actually drawing anything. Because I’m not a total idiot, I did at least avoid using glCopyTexImage2D and went straight for the faster glCopyTexSubImage2D operation instead. The typical way you would use this is:

//bind the framebuffer you’re going to read from
glBindFrameBuffer(GL_FRAMEBUFFER, myGBuffer);
glViewport(0.0, 0.0, my_gBuffer_width, my_gBuffer_height);
//specify which framebuffer attachment you’re going to read
glReadBuffer(GL_COLOR_ATTACHMENT0); //or whichever
//bind the texture you’re going to copy to
glBindTexture(GL_TEXTURE_2D, myDuplicateTexture);
//do the copy operation
glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0,  my_gBuffer_width, my_gBuffer_height);

The problem with this is, the performance is terrible, especially if you’re copying across a PCI bus. I was copying a colour and a depth texture for a water effect, and a quick root around in the driver monitor led to the discovery that the GL was spending at least 70% of its time on those two copy operations alone. I’d imagine that this is likely a combination of copying into system memory for some reason and stalling the pipeline; to make matters worse, in this sort of situation you can only really copy the textures immediately before you need to use them, so unless you’re going to rig up some complicated double-buffer solution copying the last frame, asynchronous pixel buffer transfers aren’t going to save you. Using driver hints to keep the texture data in GPU memory might work, or it might not.

Here’s one solution:
Continue reading How I Learned to Stop Blitting and Love the Framebuffer

More videos – DoF, sun shafts, water caustics, pixel-accurate water surfaces.

Here’s a bunch of new videos. Take a look at:

1) Dynamic depth of field on the GPU and sun shafts:

2) More of the same, but at sunrise:

3) Probably the most technically interesting clip, and a rumination on whether the trees were a mistake and we should never have left the oceans. This one demonstrates per pixel water effects (i.e. only affects parts of the screen below the surface of the water) and also shadow mapping the water surface to give underwater light shafts and caustics:

(Note also: the clouds are rendered as billboarded impostors, ray traced in the fragment shader to give perfect spheres. As you’ve probably noticed they are a bit rubbish at the moment.)

Dynamic depth of field on the GPU – part 3 of n

As of the end of part 2 (and using information from part 1, which you should totally read), it’s time to implement the actual blurring effect. This uses code that I acquired somewhere on the internet, but I can’t remember the exact attribution; so, if I got it off you, please let me know!

What you will need is:

  • a framebuffer object containing your final render as a texture
  • the texture we made in part 2, which contains each fragment’s difference in linear depth from the calculated focal length
  • two framebuffer objects each of which is the same size as your main framebuffer
  • (you can cut this down to one with a bit of clever fiddling
  • a shader which does a gaussian blur, which I’m going to explain

What we’re going to do is blur the image according to the values in the depth texture. Because fragment shaders generally like to do everything backwards, the way to do this is generate the blurred image, and then blend it with the original image, so that bigger differences in the depth texture give more of the blurred image.

Okay, here’s the implementation details:

Continue reading Dynamic depth of field on the GPU – part 3 of n

Dynamic depth of field on the GPU – Part 2 of n

Thus far, we’ve discussed the reasons for wanting to do depth of field (DoF) on the GPU. We’ve figured out that we need to get some idea of what our focal length should be, and to figure out by how much the depth of each fragment in our scene render differs from that focal length. All of this information is available in the depth buffer we generated when we rendered the scene’s geometry; if we rendered into a framebuffer object with a depth attachment, this means we have the information available on the GPU in the form of a texture.

Getting data back off the GPU is a pain. It can be accomplished effectively with pixel buffer objects, but the available bandwidth between CPU and GPU is comparatively tiny and we don’t really want to take any up if we can help it. Additionally, we don’t want either the CPU or the GPU stalled while waiting for each other’s contribution, because that’s inefficient. It’s therefore more logical to analyse the depth buffer on the GPU using a shader. As a bonus, you can do this at the same time as you’re linearising the depth buffer for other post-processing operations – for example, you might need a linear depth buffer to add fog to your scene or for SSAO.

What we’re going to do is generate a representation of how each fragment’s depth differs from the focal length, the results of which will look something like this:

Here’s how to do it:
Continue reading Dynamic depth of field on the GPU – Part 2 of n

Dynamic depth of field on the GPU – Part 1 of n

Modern 3D games use a bunch of tricks to convince our brains that we are viewing their world through some bizarre hybrid sense organ which consists of about 30% human eye and 70% movie camera. Hence we get lens flares, aperture changes and other movie staples which aren’t exactly true to life; we accept these effects probably because a) we are all so highly mediated these days that we expect the things which appear on our TVs/monitors to look like that and b) because they make shiny lights dance around the screen, and us primates love that stuff.

(An aside; anybody who wears glasses is totally used to lens flares, bloom lighting and film grain effects in everyday life, which is probably another reason why us nerds are so accepting of seeing the world as a movie. These settings can be temporarily toggled off with the use of a small amount of detergent and a soft cloth, but tend to return to the defaults over time).

What the human eye does have in spades, though, is dynamic depth of field. Anything outside of the centre of the field of view is out of focus and therefore appears blurred (and also in black and white, but let’s pretend we don’t know that). Humans generally focus on the thing in the centre of their visual field, even when the thing they are actually attending to isn’t (hence when you watch something out of the corner of your eye, it’s still blurry). Because depth of field effects weren’t at all viable on early graphics hardware, a lot of people have got used to everything in a scene having the same sharpness and dislike the addition of depth of field. However, used tastefully, it can nicely work as a framing effect; in addition it’s pretty handy to hide lower-resolution assets in the background.

The technique I am going to explain here has a major advantage for my purposes; the whole thing can be done as a post-process on the GPU, meaning that you don’t have to fiddle around with scene graphs or reading your depth buffer back for calculations on the CPU.

Continue reading Dynamic depth of field on the GPU – Part 1 of n

On GLSL subroutines in Mac OS X and disappointment

I probably don’t need to tell anybody that the state of 3D graphics is somewhat sad under OS X when compared with Windows. This isn’t really due to differences between the OpenGL API used by MacOS and DirectX, as used by Windows; even OpenGL-only Windows applications typically run significantly better than their MacOS counterparts. It’s more easily attributable to two other factors:

  1. There are far more computers running Windows than MacOS; these computers are more likely to be running intensive 3D applications (i.e. video games)
  2. Apple is notoriously disinterested in games, and notoriously tardy in keeping up with new OpenGL specifications.

This means that a) it takes a while for any new OpenGL features to make it to the Mac platform and b) they suck when they finally arrive.

As of Mavericks, the OpenGL Core profile has been upgraded to 4.0, and GLSL version 400 is supported. This means that shader subroutines have become available. Theoretically, shader subroutines are a really neat feature. Because of the way that graphics cards work, conditionals and branching in shader code incur a large performance penalty. Similarly, dynamic loops are much less efficient than loops of a fixed length, even though more recent graphics APIs claim to have fixed that. What this means is that if you have a shader which choses whether to do a cheap operation, or a more expensive operation, then it will perform worse then either (usually because it’s doing both). If that shader then choses how many times to do the expensive operation, the performance gets even worse despite the fact that it should theoretically be avoiding unnecessary iterations through the loop. This means that the best option has always been to write two different shaders for the simple and the complex operation, and not to bother dynamically limiting the number of iterations in a loop, but just hard code the smallest number you think you can get away with.

Shader subroutines were supposed to fix the first of these problems; it was supposed to be possible to write “modular” shaders, where a uniform allows you to change which operations a shader uses. In my renderer, which is admittedly poorly optimised, I would like to chose between a parallax-mapped, self-shadowing shader (expensive) or a simpler version which uses vertex normals – in this specific case, the simpler version is for drawing reflections, which don’t require the same level of detail. Here’s the results (Nvidia GTX680, MacOS X 10.9.4, similar results using both default Apple drivers and Nvidia web drivers):

  • No subroutines – both main render and reflection render use expensive shader: frame time approx. 0.028s
  • Subroutines coded in the shader, and uniforms set, but subroutines never actually called: frame time approx. 0.031s
  • Subroutines in use, cheaper subroutine used for drawing reflections: frame time approx. 0.035s

Vertex normals should be really cheap, and save a lot of performance when compared with parallax mapping everything. In addition, I’m not mixing and matching different subroutines – one subroutine is used for each pass, so switching only occurs once per frame. The problem is, the mere existence of code indicating the presence of a subroutine incurs a significant performance hit, which is actually more expensive than just giving up and using the much more complicated and expensive shader for everything.

So, yeah; disappointing.

Volumetric sun shafts, in motion

A quick video of the lighting buffer produced by my volumetric lighting shader; see the post below for more details.

And here’s a slightly longer one, which is a bit shaky-cam but emphasises the fact that you don’t need to have the light source in frame for this to work.

Sunlight volumes and scattering

(NB. as with much of my stuff, every part of this that I wrote myself is a dirty hack.)

Light scattering (“god rays”) is a beautiful effect; in fact it’s so beautiful, it’s one of the rare bits of eye-candy that everybody bitches about (OMG so much GRAPHICS) but everybody also secretly loves. Good examples of the technique performed as a post-process can be found here or here.

Here’s what it looks like in nature:

Actual real sunset.  Note the "rays" visible below the sun.
Actual real sunset. Note the “rays” visible below the sun.

And here’s Crytek’s approach:


The implementation above is performed as a sort of radial blur outward from the screen-space position of the light source, masked by a representation of any objects occluding the light – trees, landscape, buildings, character models, etc. Apart from the difficult concept that this process is all backwards in shader language (because you can only influence the fragment you’re currently drawing, you’re marching from the point on screen towards the light source, not the other way round), this is pretty easy to implement. There are a couple of downsides, one of which is very minor and the other of which starts to get on the nerves:

1) This isn’t even slightly true-to-life in terms of physical parameters – it’s an “art” effect, and you tweak it until it looks good. The results won’t be affected by atmospheric conditions, such as fog. This is the very minor downside.

2) This effect only works when the light source is on the screen. You have to fade it out whenever the light source isn’t within the camera’s field of view, or you get “opposite” light shafts such that, for example, the sun is suddenly now setting in the east. In addition, you can’t have any light shafts entering the frame from the side – so, if you look down at your feet, the shafts are suddenly gone. This is the major bugbear.

In order to do a proper “light shafts” effect, then, we need to know where in our scene light can get to, and how much of that light can make it to the camera. Fortunately, the first question can be answered easily if we’re set up to cast shadows from the main light – the shadowmap contains the information needed. Unfortunately, answer to the second question is much more complicated than it sounds. To get round this problem, we’re going to need to find a way to integrate all of the light being scattered in along a ray from the camera to each visible point.

Yes, folks, we’re going to need to write a ray tracer. It’s OK though, we don’t actually need to write a good one.

Continue reading Sunlight volumes and scattering

Orientation Matrices for Cube Mapping

I may well be totally wrong, but I don’t think I’ve ever successfully Googled a useful set of orientation matrices for cube mapping. As you already know, a cube map is a set of six images which, when projected on to a cube, provide a decent simulation of a spherical texture map. Two of the most common modes of usage are to provide real-time reflections on the outside of an object (by repeatedly making a map of the environment surrounding that object and then projecting that map back on to the object, as in the reflections you see on e.g. the cars in racing games), and to provide a skybox.

Skyboxes are usually either pre-rendered (pretty but boring), or done through rendering atmospheric scattering for your scene and then projecting some celestial bodies like the moon, stars etc (pretty but computationally intensive). An additional bonus of drawing your own skyboxes is that you can then use them for doing environment/ambient lighting for objects in your scene, either by working out the spherical harmonics (neat but I’m far too dumb to have ever wrapped my head around it) or by techniques which involve downsampling the cube map. This gives you ambient light which changes colour depending on the angle of the sun basically for free.

Therefore cube maps have multiple advantages for rendering your skybox and lighting:
1) render once and reuse (you can make this once per frame, or less often depending on how dynamic the sun is.)
2) you can do atmospheric scattering at a surprisingly low resolution and still have a decent looking result. You basically have to do your atmospheric scattering in the fragment shader if you want to use a “spotlight” effect to render the sun, which gets very expensive in terms of fragment power. I actually do both the sun and the moon, which is even more expensive, so lowering resolution is a major speed-up here.
3) basically free specular and ambient environment mapping of the sky on to everything in your scene. You can either go the very expensive route for downsampling, or just mipmap the thing and get 90% of the quality for 10% of the effort, and hardware acceleration.
4) if you’re blending your scene into the sky for a distance fogging effect – well, you just got the source for that as well!

This is where you usually run into a brick wall because figuring out the correct orientation matrices for rendering the cube map is a pain in the backside. What you’re going to be doing in the end is rendering a box around your camera and texture mapping the cube map on to the inside of the box, which will then act as the skybox. You can simplify this by not applying any rotation to the skybox, so that it’s aligned with the x, y and z axes. Therefore what you need to do is figure out how to make the camera look in six directions: +x, -x, +y, -y, +z and -z. You could do this with gluLookAt, but that’s a whole heck of a lot of lines of code just to look in the direction of an axis. Better to just know what matrices to use: see below. (I’m weird and use +x = east, +y = north, -z = up i.e. inverted right-handed axes.)

Continue reading Orientation Matrices for Cube Mapping