Tag Archives: glsl

World Space Coordinates of Screen Fragments 2: Manchester Prep

I’ve been inspired by the fact that a very nice person commented on my previous post about getting the world-space coordinates of the near plane of the view frustum to revisit this topic, because it’s been bugging me that my previous technique required a matrix multiplication and that feels like it might be more expensive than strictly needed. As I discussed before, you might want to know where exactly in your world the screen is so that if it intersects with something, you can treat that part of the screen differently – for example if your camera is partially submerged in water, you might want to apply a fogging and distortion effect to those fragments below the surface, but not above it.

The first thing to understand is how your camera is oriented. For historical reasons, and because of the way that my world axes are set up (+x is east, +y is north, +z is down; the camera’s neutral direction is looking north down the y axis.), the camera orients itself in world space by rotating around the z axis to look left-right, and around the x axis to look up-down. Just to make things more confusing, because the world moves around the camera in OGL, remember that in your shaders, the camera’s coordinates are negative (i.e. your shaders think your camera is at (-cameraX, -cameraY, -cameraZ). You can cut through a lot of confusion by using a system like gluLookAt() to orient your camera, which confers a huge bonus in that it returns both the direction in which the camera is facing and also the camera’s direction for “up”, which will be very handy.

The first step is to work out where the camera is and which direction it’s looking. In my case, I keep track of the camera’s position as (cameraX, cameraY, cameraZ), and rotation around Z and X in radians (i.e. pi radians is 180 degrees). My camera matrix rotates the camera around the Z axis and then around its own X axis, and then translates to its location in world space. Using this system, the camera’s unit vector is worked out like this:

Continue reading World Space Coordinates of Screen Fragments 2: Manchester Prep

How I Learned to Stop Blitting and Love the Framebuffer

A deferred rendering pipeline provides great opportunities to run post processing shaders on the results from your G-buffer (A G-buffer is the target for the first pass in a deferred renderer, usually consisting of colour, normal, depth ± material textures, which stores the information needed for subsequent lighting and effects passes). There are a couple of issues, however:

1) OpenGL can’t both read from and write to a texture at the same time; attempting to do so will give an undefined result (i.e. per OpenGL tradition it will look like the result you wanted, except when it doesn’t)
2) What, therefore, do you do about transparency in a deferred renderer? You need the information about what’s behind the transparent object, and how far away it is. That’s in your G-buffer, which you’re already writing to (I’m assuming here that you’re rendering transparent materials last, which is really the only sane way to do it).

What you’re going to need is a copy of a subset of your G-buffer to provide the information you need to draw the stuff behind your transparent material. There’s an expensive way and a cheap way to do this; the expensive way is to do all of your deferred lighting before you render any transparent materials (which does make life easier in some ways, but means you need to do two lighting passes, one for opaque and one for transparent materials), the cheap way is to decide that you’re not going to bother to light the stuff behind the transparency because you’re already going to be throwing a bunch of refraction effects on top anyway and all you really need is a bit of detail to sell the effect.

Here’s where I tripped myself up, in the usual manner for a novice learning OpenGL from 10-year-old tutorials on the internet; I naïvely thought that the most logical thing to do at this point would be to blit (i.e. copy the pixels directly) from my G-buffer to another set of textures which I would then use as the source for rendering transparency. Duplicating a chunk of memory seemed like it was going to be a much faster operation than actually drawing anything. Because I’m not a total idiot, I did at least avoid using glCopyTexImage2D and went straight for the faster glCopyTexSubImage2D operation instead. The typical way you would use this is:

//bind the framebuffer you’re going to read from
glBindFrameBuffer(GL_FRAMEBUFFER, myGBuffer);
glViewport(0.0, 0.0, my_gBuffer_width, my_gBuffer_height);
 
//specify which framebuffer attachment you’re going to read
glReadBuffer(GL_COLOR_ATTACHMENT0); //or whichever
 
//bind the texture you’re going to copy to
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, myDuplicateTexture);
 
//do the copy operation
glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0,  my_gBuffer_width, my_gBuffer_height);

The problem with this is, the performance is terrible, especially if you’re copying across a PCI bus. I was copying a colour and a depth texture for a water effect, and a quick root around in the driver monitor led to the discovery that the GL was spending at least 70% of its time on those two copy operations alone. I’d imagine that this is likely a combination of copying into system memory for some reason and stalling the pipeline; to make matters worse, in this sort of situation you can only really copy the textures immediately before you need to use them, so unless you’re going to rig up some complicated double-buffer solution copying the last frame, asynchronous pixel buffer transfers aren’t going to save you. Using driver hints to keep the texture data in GPU memory might work, or it might not.

Here’s one solution:
Continue reading How I Learned to Stop Blitting and Love the Framebuffer

More videos – DoF, sun shafts, water caustics, pixel-accurate water surfaces.

Here’s a bunch of new videos. Take a look at:

1) Dynamic depth of field on the GPU and sun shafts:

2) More of the same, but at sunrise:

3) Probably the most technically interesting clip, and a rumination on whether the trees were a mistake and we should never have left the oceans. This one demonstrates per pixel water effects (i.e. only affects parts of the screen below the surface of the water) and also shadow mapping the water surface to give underwater light shafts and caustics:

(Note also: the clouds are rendered as billboarded impostors, ray traced in the fragment shader to give perfect spheres. As you’ve probably noticed they are a bit rubbish at the moment.)

Dynamic depth of field on the GPU – part 3 of n

As of the end of part 2 (and using information from part 1, which you should totally read), it’s time to implement the actual blurring effect. This uses code that I acquired somewhere on the internet, but I can’t remember the exact attribution; so, if I got it off you, please let me know!

What you will need is:

  • a framebuffer object containing your final render as a texture
  • the texture we made in part 2, which contains each fragment’s difference in linear depth from the calculated focal length
  • two framebuffer objects each of which is the same size as your main framebuffer
  • (you can cut this down to one with a bit of clever fiddling
  • a shader which does a gaussian blur, which I’m going to explain

What we’re going to do is blur the image according to the values in the depth texture. Because fragment shaders generally like to do everything backwards, the way to do this is generate the blurred image, and then blend it with the original image, so that bigger differences in the depth texture give more of the blurred image.

Okay, here’s the implementation details:

Continue reading Dynamic depth of field on the GPU – part 3 of n

Dynamic depth of field on the GPU – Part 2 of n

Thus far, we’ve discussed the reasons for wanting to do depth of field (DoF) on the GPU. We’ve figured out that we need to get some idea of what our focal length should be, and to figure out by how much the depth of each fragment in our scene render differs from that focal length. All of this information is available in the depth buffer we generated when we rendered the scene’s geometry; if we rendered into a framebuffer object with a depth attachment, this means we have the information available on the GPU in the form of a texture.

Getting data back off the GPU is a pain. It can be accomplished effectively with pixel buffer objects, but the available bandwidth between CPU and GPU is comparatively tiny and we don’t really want to take any up if we can help it. Additionally, we don’t want either the CPU or the GPU stalled while waiting for each other’s contribution, because that’s inefficient. It’s therefore more logical to analyse the depth buffer on the GPU using a shader. As a bonus, you can do this at the same time as you’re linearising the depth buffer for other post-processing operations – for example, you might need a linear depth buffer to add fog to your scene or for SSAO.

What we’re going to do is generate a representation of how each fragment’s depth differs from the focal length, the results of which will look something like this:

Here’s how to do it:
Continue reading Dynamic depth of field on the GPU – Part 2 of n

Dynamic depth of field on the GPU – Part 1 of n

Modern 3D games use a bunch of tricks to convince our brains that we are viewing their world through some bizarre hybrid sense organ which consists of about 30% human eye and 70% movie camera. Hence we get lens flares, aperture changes and other movie staples which aren’t exactly true to life; we accept these effects probably because a) we are all so highly mediated these days that we expect the things which appear on our TVs/monitors to look like that and b) because they make shiny lights dance around the screen, and us primates love that stuff.

(An aside; anybody who wears glasses is totally used to lens flares, bloom lighting and film grain effects in everyday life, which is probably another reason why us nerds are so accepting of seeing the world as a movie. These settings can be temporarily toggled off with the use of a small amount of detergent and a soft cloth, but tend to return to the defaults over time).

What the human eye does have in spades, though, is dynamic depth of field. Anything outside of the centre of the field of view is out of focus and therefore appears blurred (and also in black and white, but let’s pretend we don’t know that). Humans generally focus on the thing in the centre of their visual field, even when the thing they are actually attending to isn’t (hence when you watch something out of the corner of your eye, it’s still blurry). Because depth of field effects weren’t at all viable on early graphics hardware, a lot of people have got used to everything in a scene having the same sharpness and dislike the addition of depth of field. However, used tastefully, it can nicely work as a framing effect; in addition it’s pretty handy to hide lower-resolution assets in the background.

The technique I am going to explain here has a major advantage for my purposes; the whole thing can be done as a post-process on the GPU, meaning that you don’t have to fiddle around with scene graphs or reading your depth buffer back for calculations on the CPU.

Continue reading Dynamic depth of field on the GPU – Part 1 of n

On GLSL subroutines in Mac OS X and disappointment

I probably don’t need to tell anybody that the state of 3D graphics is somewhat sad under OS X when compared with Windows. This isn’t really due to differences between the OpenGL API used by MacOS and DirectX, as used by Windows; even OpenGL-only Windows applications typically run significantly better than their MacOS counterparts. It’s more easily attributable to two other factors:

  1. There are far more computers running Windows than MacOS; these computers are more likely to be running intensive 3D applications (i.e. video games)
  2. Apple is notoriously disinterested in games, and notoriously tardy in keeping up with new OpenGL specifications.

This means that a) it takes a while for any new OpenGL features to make it to the Mac platform and b) they suck when they finally arrive.

As of Mavericks, the OpenGL Core profile has been upgraded to 4.0, and GLSL version 400 is supported. This means that shader subroutines have become available. Theoretically, shader subroutines are a really neat feature. Because of the way that graphics cards work, conditionals and branching in shader code incur a large performance penalty. Similarly, dynamic loops are much less efficient than loops of a fixed length, even though more recent graphics APIs claim to have fixed that. What this means is that if you have a shader which choses whether to do a cheap operation, or a more expensive operation, then it will perform worse then either (usually because it’s doing both). If that shader then choses how many times to do the expensive operation, the performance gets even worse despite the fact that it should theoretically be avoiding unnecessary iterations through the loop. This means that the best option has always been to write two different shaders for the simple and the complex operation, and not to bother dynamically limiting the number of iterations in a loop, but just hard code the smallest number you think you can get away with.

Shader subroutines were supposed to fix the first of these problems; it was supposed to be possible to write “modular” shaders, where a uniform allows you to change which operations a shader uses. In my renderer, which is admittedly poorly optimised, I would like to chose between a parallax-mapped, self-shadowing shader (expensive) or a simpler version which uses vertex normals – in this specific case, the simpler version is for drawing reflections, which don’t require the same level of detail. Here’s the results (Nvidia GTX680, MacOS X 10.9.4, similar results using both default Apple drivers and Nvidia web drivers):

  • No subroutines – both main render and reflection render use expensive shader: frame time approx. 0.028s
  • Subroutines coded in the shader, and uniforms set, but subroutines never actually called: frame time approx. 0.031s
  • Subroutines in use, cheaper subroutine used for drawing reflections: frame time approx. 0.035s

Vertex normals should be really cheap, and save a lot of performance when compared with parallax mapping everything. In addition, I’m not mixing and matching different subroutines – one subroutine is used for each pass, so switching only occurs once per frame. The problem is, the mere existence of code indicating the presence of a subroutine incurs a significant performance hit, which is actually more expensive than just giving up and using the much more complicated and expensive shader for everything.

So, yeah; disappointing.

Sunlight volumes and scattering

(NB. as with much of my stuff, every part of this that I wrote myself is a dirty hack.)

Light scattering (“god rays”) is a beautiful effect; in fact it’s so beautiful, it’s one of the rare bits of eye-candy that everybody bitches about (OMG so much GRAPHICS) but everybody also secretly loves. Good examples of the technique performed as a post-process can be found here or here.

Here’s what it looks like in nature:

Actual real sunset.  Note the "rays" visible below the sun.
Actual real sunset. Note the “rays” visible below the sun.

And here’s Crytek’s approach:

MAXIMUM SHINY
MAXIMUM SHINY

The implementation above is performed as a sort of radial blur outward from the screen-space position of the light source, masked by a representation of any objects occluding the light – trees, landscape, buildings, character models, etc. Apart from the difficult concept that this process is all backwards in shader language (because you can only influence the fragment you’re currently drawing, you’re marching from the point on screen towards the light source, not the other way round), this is pretty easy to implement. There are a couple of downsides, one of which is very minor and the other of which starts to get on the nerves:

1) This isn’t even slightly true-to-life in terms of physical parameters – it’s an “art” effect, and you tweak it until it looks good. The results won’t be affected by atmospheric conditions, such as fog. This is the very minor downside.

2) This effect only works when the light source is on the screen. You have to fade it out whenever the light source isn’t within the camera’s field of view, or you get “opposite” light shafts such that, for example, the sun is suddenly now setting in the east. In addition, you can’t have any light shafts entering the frame from the side – so, if you look down at your feet, the shafts are suddenly gone. This is the major bugbear.

In order to do a proper “light shafts” effect, then, we need to know where in our scene light can get to, and how much of that light can make it to the camera. Fortunately, the first question can be answered easily if we’re set up to cast shadows from the main light – the shadowmap contains the information needed. Unfortunately, answer to the second question is much more complicated than it sounds. To get round this problem, we’re going to need to find a way to integrate all of the light being scattered in along a ray from the camera to each visible point.

Yes, folks, we’re going to need to write a ray tracer. It’s OK though, we don’t actually need to write a good one.

Continue reading Sunlight volumes and scattering

Getting world-space coordinates of screen fragments in glsl

So, you’re probably asking yourself why on earth you would even want to do that? Well, it’s useful information if you’ve got a camera which can intersect with things in your scene. The most obvious example here is water – say, for example, you wanted to be able to have a distort effect and reduced fogging distance to make the underwater part of your environment visually distinct from the part above water:

Camera Submerging

You may have noticed that in e.g. elder scrolls games you can trick the camera into behaving as if it’s not underwater when you’re near to the air-water interface. This is presumably because they’ve just set a camera height which denotes “underwater”, but what if the player has positioned the camera so that half of the screen is underwater and half is above? Hence you need an approach which works per-fragment.

How do you work out which part of your screen is actually underwater? You generally can’t do it when you’re rendering your water’s surface, because you’re not going to be shading any pixels which aren’t directly at the air-water interface. If you’re under the surface of the water and looking down, this approach will immediately fail. What you are going to want to do is find out a way of masking off bits of your screen that are underwater and using them to do your “underwater” effects.

Continue reading Getting world-space coordinates of screen fragments in glsl

A tale of two vectors (normal reconstruction and driver differences)

If you’re playing around with deferred rendering or post-process techniques, you’ve probably come across the concept that you can recover camera-space surface normals from camera space position like so:

vec3 reconstructCameraSpaceFaceNormal(vec3 CameraSpacePosition) {
    vec3 res = normalize(cross(dFdy(CameraSpacePosition), dFdx(CameraSpacePosition)));
    return res;
}

where C is the camera-space position.

What you might not realise is that you’re accidentally setting yourself up for confusion depending on your graphics driver. For the longest time, I was using this technique to try to implement SSAO without having to bother with storing screen space normals. After fiddling about a bit I noticed that on my desktop with an NVIDIA GTX680 everything looked OK, while on my laptop with intel HD integrated graphics everything looked inverted. I then tried reversing the normal I was getting out of this function. Success! The laptop is now displaying correctly. Failure! The desktop is now screwed up.

Continue reading A tale of two vectors (normal reconstruction and driver differences)