# World Space Coordinates of Screen Fragments 2: Manchester Prep

I’ve been inspired by the fact that a very nice person commented on my previous post about getting the world-space coordinates of the near plane of the view frustum to revisit this topic, because it’s been bugging me that my previous technique required a matrix multiplication and that feels like it might be more expensive than strictly needed. As I discussed before, you might want to know where exactly in your world the screen is so that if it intersects with something, you can treat that part of the screen differently – for example if your camera is partially submerged in water, you might want to apply a fogging and distortion effect to those fragments below the surface, but not above it.

The first thing to understand is how your camera is oriented. For historical reasons, and because of the way that my world axes are set up (+x is east, +y is north, +z is down; the camera’s neutral direction is looking north down the y axis.), the camera orients itself in world space by rotating around the z axis to look left-right, and around the x axis to look up-down. Just to make things more confusing, because the world moves around the camera in OGL, remember that in your shaders, the camera’s coordinates are negative (i.e. your shaders think your camera is at (-cameraX, -cameraY, -cameraZ). You can cut through a lot of confusion by using a system like gluLookAt() to orient your camera, which confers a huge bonus in that it returns both the direction in which the camera is facing and also the camera’s direction for “up”, which will be very handy.

The first step is to work out where the camera is and which direction it’s looking. In my case, I keep track of the camera’s position as (cameraX, cameraY, cameraZ), and rotation around Z and X in radians (i.e. pi radians is 180 degrees). My camera matrix rotates the camera around the Z axis and then around its own X axis, and then translates to its location in world space. Using this system, the camera’s unit vector is worked out like this:

# How I Learned to Stop Blitting and Love the Framebuffer

A deferred rendering pipeline provides great opportunities to run post processing shaders on the results from your G-buffer (A G-buffer is the target for the first pass in a deferred renderer, usually consisting of colour, normal, depth ± material textures, which stores the information needed for subsequent lighting and effects passes). There are a couple of issues, however:

1) OpenGL can’t both read from and write to a texture at the same time; attempting to do so will give an undefined result (i.e. per OpenGL tradition it will look like the result you wanted, except when it doesn’t)
2) What, therefore, do you do about transparency in a deferred renderer? You need the information about what’s behind the transparent object, and how far away it is. That’s in your G-buffer, which you’re already writing to (I’m assuming here that you’re rendering transparent materials last, which is really the only sane way to do it).

What you’re going to need is a copy of a subset of your G-buffer to provide the information you need to draw the stuff behind your transparent material. There’s an expensive way and a cheap way to do this; the expensive way is to do all of your deferred lighting before you render any transparent materials (which does make life easier in some ways, but means you need to do two lighting passes, one for opaque and one for transparent materials), the cheap way is to decide that you’re not going to bother to light the stuff behind the transparency because you’re already going to be throwing a bunch of refraction effects on top anyway and all you really need is a bit of detail to sell the effect.

Here’s where I tripped myself up, in the usual manner for a novice learning OpenGL from 10-year-old tutorials on the internet; I naïvely thought that the most logical thing to do at this point would be to blit (i.e. copy the pixels directly) from my G-buffer to another set of textures which I would then use as the source for rendering transparency. Duplicating a chunk of memory seemed like it was going to be a much faster operation than actually drawing anything. Because I’m not a total idiot, I did at least avoid using glCopyTexImage2D and went straight for the faster glCopyTexSubImage2D operation instead. The typical way you would use this is:

```//bind the framebuffer you’re going to read from glBindFrameBuffer(GL_FRAMEBUFFER, myGBuffer); glViewport(0.0, 0.0, my_gBuffer_width, my_gBuffer_height);   //specify which framebuffer attachment you’re going to read glReadBuffer(GL_COLOR_ATTACHMENT0); //or whichever   //bind the texture you’re going to copy to glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, myDuplicateTexture);   //do the copy operation glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, my_gBuffer_width, my_gBuffer_height);```

The problem with this is, the performance is terrible, especially if you’re copying across a PCI bus. I was copying a colour and a depth texture for a water effect, and a quick root around in the driver monitor led to the discovery that the GL was spending at least 70% of its time on those two copy operations alone. I’d imagine that this is likely a combination of copying into system memory for some reason and stalling the pipeline; to make matters worse, in this sort of situation you can only really copy the textures immediately before you need to use them, so unless you’re going to rig up some complicated double-buffer solution copying the last frame, asynchronous pixel buffer transfers aren’t going to save you. Using driver hints to keep the texture data in GPU memory might work, or it might not.

Here’s one solution:
Continue reading How I Learned to Stop Blitting and Love the Framebuffer

# More videos – DoF, sun shafts, water caustics, pixel-accurate water surfaces.

Here’s a bunch of new videos. Take a look at:

2) More of the same, but at sunrise:

3) Probably the most technically interesting clip, and a rumination on whether the trees were a mistake and we should never have left the oceans. This one demonstrates per pixel water effects (i.e. only affects parts of the screen below the surface of the water) and also shadow mapping the water surface to give underwater light shafts and caustics:

(Note also: the clouds are rendered as billboarded impostors, ray traced in the fragment shader to give perfect spheres. As you’ve probably noticed they are a bit rubbish at the moment.)

# Dynamic depth of field on the GPU – part 3 of n

As of the end of part 2 (and using information from part 1, which you should totally read), it’s time to implement the actual blurring effect. This uses code that I acquired somewhere on the internet, but I can’t remember the exact attribution; so, if I got it off you, please let me know!

What you will need is:

• a framebuffer object containing your final render as a texture
• the texture we made in part 2, which contains each fragment’s difference in linear depth from the calculated focal length
• two framebuffer objects each of which is the same size as your main framebuffer
• (you can cut this down to one with a bit of clever fiddling
• a shader which does a gaussian blur, which I’m going to explain

What we’re going to do is blur the image according to the values in the depth texture. Because fragment shaders generally like to do everything backwards, the way to do this is generate the blurred image, and then blend it with the original image, so that bigger differences in the depth texture give more of the blurred image.

Okay, here’s the implementation details:

# Dynamic depth of field on the GPU – Part 2 of n

Thus far, we’ve discussed the reasons for wanting to do depth of field (DoF) on the GPU. We’ve figured out that we need to get some idea of what our focal length should be, and to figure out by how much the depth of each fragment in our scene render differs from that focal length. All of this information is available in the depth buffer we generated when we rendered the scene’s geometry; if we rendered into a framebuffer object with a depth attachment, this means we have the information available on the GPU in the form of a texture.

Getting data back off the GPU is a pain. It can be accomplished effectively with pixel buffer objects, but the available bandwidth between CPU and GPU is comparatively tiny and we don’t really want to take any up if we can help it. Additionally, we don’t want either the CPU or the GPU stalled while waiting for each other’s contribution, because that’s inefficient. It’s therefore more logical to analyse the depth buffer on the GPU using a shader. As a bonus, you can do this at the same time as you’re linearising the depth buffer for other post-processing operations – for example, you might need a linear depth buffer to add fog to your scene or for SSAO.

What we’re going to do is generate a representation of how each fragment’s depth differs from the focal length, the results of which will look something like this:

Here’s how to do it:
Continue reading Dynamic depth of field on the GPU – Part 2 of n

# On GLSL subroutines in Mac OS X and disappointment

I probably don’t need to tell anybody that the state of 3D graphics is somewhat sad under OS X when compared with Windows. This isn’t really due to differences between the OpenGL API used by MacOS and DirectX, as used by Windows; even OpenGL-only Windows applications typically run significantly better than their MacOS counterparts. It’s more easily attributable to two other factors:

1. There are far more computers running Windows than MacOS; these computers are more likely to be running intensive 3D applications (i.e. video games)
2. Apple is notoriously disinterested in games, and notoriously tardy in keeping up with new OpenGL specifications.

This means that a) it takes a while for any new OpenGL features to make it to the Mac platform and b) they suck when they finally arrive.

As of Mavericks, the OpenGL Core profile has been upgraded to 4.0, and GLSL version 400 is supported. This means that shader subroutines have become available. Theoretically, shader subroutines are a really neat feature. Because of the way that graphics cards work, conditionals and branching in shader code incur a large performance penalty. Similarly, dynamic loops are much less efficient than loops of a fixed length, even though more recent graphics APIs claim to have fixed that. What this means is that if you have a shader which choses whether to do a cheap operation, or a more expensive operation, then it will perform worse then either (usually because it’s doing both). If that shader then choses how many times to do the expensive operation, the performance gets even worse despite the fact that it should theoretically be avoiding unnecessary iterations through the loop. This means that the best option has always been to write two different shaders for the simple and the complex operation, and not to bother dynamically limiting the number of iterations in a loop, but just hard code the smallest number you think you can get away with.

Shader subroutines were supposed to fix the first of these problems; it was supposed to be possible to write “modular” shaders, where a uniform allows you to change which operations a shader uses. In my renderer, which is admittedly poorly optimised, I would like to chose between a parallax-mapped, self-shadowing shader (expensive) or a simpler version which uses vertex normals – in this specific case, the simpler version is for drawing reflections, which don’t require the same level of detail. Here’s the results (Nvidia GTX680, MacOS X 10.9.4, similar results using both default Apple drivers and Nvidia web drivers):

• No subroutines – both main render and reflection render use expensive shader: frame time approx. 0.028s
• Subroutines coded in the shader, and uniforms set, but subroutines never actually called: frame time approx. 0.031s
• Subroutines in use, cheaper subroutine used for drawing reflections: frame time approx. 0.035s

Vertex normals should be really cheap, and save a lot of performance when compared with parallax mapping everything. In addition, I’m not mixing and matching different subroutines – one subroutine is used for each pass, so switching only occurs once per frame. The problem is, the mere existence of code indicating the presence of a subroutine incurs a significant performance hit, which is actually more expensive than just giving up and using the much more complicated and expensive shader for everything.

So, yeah; disappointing.

# Orientation Matrices for Cube Mapping

I may well be totally wrong, but I don’t think I’ve ever successfully Googled a useful set of orientation matrices for cube mapping. As you already know, a cube map is a set of six images which, when projected on to a cube, provide a decent simulation of a spherical texture map. Two of the most common modes of usage are to provide real-time reflections on the outside of an object (by repeatedly making a map of the environment surrounding that object and then projecting that map back on to the object, as in the reflections you see on e.g. the cars in racing games), and to provide a skybox.

Skyboxes are usually either pre-rendered (pretty but boring), or done through rendering atmospheric scattering for your scene and then projecting some celestial bodies like the moon, stars etc (pretty but computationally intensive). An additional bonus of drawing your own skyboxes is that you can then use them for doing environment/ambient lighting for objects in your scene, either by working out the spherical harmonics (neat but I’m far too dumb to have ever wrapped my head around it) or by techniques which involve downsampling the cube map. This gives you ambient light which changes colour depending on the angle of the sun basically for free.

Therefore cube maps have multiple advantages for rendering your skybox and lighting:
1) render once and reuse (you can make this once per frame, or less often depending on how dynamic the sun is.)
2) you can do atmospheric scattering at a surprisingly low resolution and still have a decent looking result. You basically have to do your atmospheric scattering in the fragment shader if you want to use a “spotlight” effect to render the sun, which gets very expensive in terms of fragment power. I actually do both the sun and the moon, which is even more expensive, so lowering resolution is a major speed-up here.
3) basically free specular and ambient environment mapping of the sky on to everything in your scene. You can either go the very expensive route for downsampling, or just mipmap the thing and get 90% of the quality for 10% of the effort, and hardware acceleration.
4) if you’re blending your scene into the sky for a distance fogging effect – well, you just got the source for that as well!

This is where you usually run into a brick wall because figuring out the correct orientation matrices for rendering the cube map is a pain in the backside. What you’re going to be doing in the end is rendering a box around your camera and texture mapping the cube map on to the inside of the box, which will then act as the skybox. You can simplify this by not applying any rotation to the skybox, so that it’s aligned with the x, y and z axes. Therefore what you need to do is figure out how to make the camera look in six directions: +x, -x, +y, -y, +z and -z. You could do this with gluLookAt, but that’s a whole heck of a lot of lines of code just to look in the direction of an axis. Better to just know what matrices to use: see below. (I’m weird and use +x = east, +y = north, -z = up i.e. inverted right-handed axes.)

# LOD

I think I’ve come up with a new definition for “optimisation” in the context of writing a 3D engine, where it means “to painstakingly claw back some of your frame budget, and then immediately blow it on a new engine feature”. Hence the above looks pretty but currently runs at 10 frames per second on a SNB Core i5 MacBook Air.

New features include LOD-heavy instanced grass rendering, deferred lighting, cloud shadows and BLOOOOOOOOOOOM. Once I’ve “optimised” those features as well, I’ll do some write-ups – the instanced rendering method using matrix buffers which I picked up here is particularly cool.

# Dealing with .csv files in Cocoa – writing an importer

Mac users who need to deal with large amounts of data might justifiably feel like they got the short end of the stick as far as Microsoft Office is concerned. The Mac version of Excel doesn’t support Visual Basic and isn’t multithreaded, meaning that:

a) as soon as you get above a couple of tens of thousands of rows, any formula more complicated than summing a column freezes the UI for a couple of minutes while it crunches the numbers, and

b) any data analysis you want to do which involves iteration results in formulae consisting of ten lines of densely nested brackets, which are nearly impossible to read or debug.

Like any bad programmer, I implicitly believe that my language of choice is the perfect tool for any job, and hence when recently confronted with a very large stack of data I needed to analyse I decided it would be easiest to Object Oriented the hell out of it with a small custom C application.

# NSDocument saving quirks

Let’s say you have a document-based application which worked fine under Leopard/Snow Leopard.  Each document is backed by an XML store, and hence the saving method works by exporting the contents of a number of NSTextViews into one string of XML, which is saved to disk.  You’ve been happily overriding

`- (BOOL)saveToURL:(NSURL *)url ofType:(NSString *)typeName forSaveOperation:(NSSaveOperationType)saveOperation error:(NSError **)outError`

as being a sensible point to insert your custom document-saving code – in my case, I send the NSString which holds all of the document’s data to a basic XML exporter, which does clever stuff like removing all of the illegal characters, etc.  You then use NSString’s writeToURL: atomically:encoding:error method to do the actual write.  This works fine pre-Lion.

Everything goes swimmingly until you upgrade to Lion/Mountain Lion and try to save the document in place (i.e. save rather than save as:).  Your application pops up a warning sheet saying “This document’s file has been changed by another application since you opened or saved it.

Every.  Single.  Time.

Workaround: give your application a file wrapper so that you can add some metadata, and you can trick your application into realising that the file hasn’t been altered after all. You can do this by overriding NSDocument’s fileWrapperOfType: method rather than saveToURL. This is from an application for writing questions and answers to an XML file which is then used as the data source for a quiz application, hence the funny QuestionExporter/setQuizDocument object and setter:

```- (NSFileWrapper *)fileWrapperOfType:(NSString *)typeName error:(NSError *__autoreleasing *)outError { QuestionExporter *exporter = [[QuestionExporter alloc]init]; [exporter setQuizDocument:self]; NSString *xmlString = [exporter exportQuestionsToString]; NSFileWrapper *wrapper = [[NSFileWrapper alloc]initRegularFileWithContents:[xmlString dataUsingEncoding:NSUTF8StringEncoding]]; return wrapper; }```