Orientation Matrices for Cube Mapping

I may well be totally wrong, but I don’t think I’ve ever successfully Googled a useful set of orientation matrices for cube mapping. As you already know, a cube map is a set of six images which, when projected on to a cube, provide a decent simulation of a spherical texture map. Two of the most common modes of usage are to provide real-time reflections on the outside of an object (by repeatedly making a map of the environment surrounding that object and then projecting that map back on to the object, as in the reflections you see on e.g. the cars in racing games), and to provide a skybox.

Skyboxes are usually either pre-rendered (pretty but boring), or done through rendering atmospheric scattering for your scene and then projecting some celestial bodies like the moon, stars etc (pretty but computationally intensive). An additional bonus of drawing your own skyboxes is that you can then use them for doing environment/ambient lighting for objects in your scene, either by working out the spherical harmonics (neat but I’m far too dumb to have ever wrapped my head around it) or by techniques which involve downsampling the cube map. This gives you ambient light which changes colour depending on the angle of the sun basically for free.

Therefore cube maps have multiple advantages for rendering your skybox and lighting:
1) render once and reuse (you can make this once per frame, or less often depending on how dynamic the sun is.)
2) you can do atmospheric scattering at a surprisingly low resolution and still have a decent looking result. You basically have to do your atmospheric scattering in the fragment shader if you want to use a “spotlight” effect to render the sun, which gets very expensive in terms of fragment power. I actually do both the sun and the moon, which is even more expensive, so lowering resolution is a major speed-up here.
3) basically free specular and ambient environment mapping of the sky on to everything in your scene. You can either go the very expensive route for downsampling, or just mipmap the thing and get 90% of the quality for 10% of the effort, and hardware acceleration.
4) if you’re blending your scene into the sky for a distance fogging effect – well, you just got the source for that as well!

This is where you usually run into a brick wall because figuring out the correct orientation matrices for rendering the cube map is a pain in the backside. What you’re going to be doing in the end is rendering a box around your camera and texture mapping the cube map on to the inside of the box, which will then act as the skybox. You can simplify this by not applying any rotation to the skybox, so that it’s aligned with the x, y and z axes. Therefore what you need to do is figure out how to make the camera look in six directions: +x, -x, +y, -y, +z and -z. You could do this with gluLookAt, but that’s a whole heck of a lot of lines of code just to look in the direction of an axis. Better to just know what matrices to use: see below. (I’m weird and use +x = east, +y = north, -z = up i.e. inverted right-handed axes.)

Continue reading Orientation Matrices for Cube Mapping

Getting world-space coordinates of screen fragments in glsl

So, you’re probably asking yourself why on earth you would even want to do that? Well, it’s useful information if you’ve got a camera which can intersect with things in your scene. The most obvious example here is water – say, for example, you wanted to be able to have a distort effect and reduced fogging distance to make the underwater part of your environment visually distinct from the part above water:

Camera Submerging

You may have noticed that in e.g. elder scrolls games you can trick the camera into behaving as if it’s not underwater when you’re near to the air-water interface. This is presumably because they’ve just set a camera height which denotes “underwater”, but what if the player has positioned the camera so that half of the screen is underwater and half is above? Hence you need an approach which works per-fragment.

How do you work out which part of your screen is actually underwater? You generally can’t do it when you’re rendering your water’s surface, because you’re not going to be shading any pixels which aren’t directly at the air-water interface. If you’re under the surface of the water and looking down, this approach will immediately fail. What you are going to want to do is find out a way of masking off bits of your screen that are underwater and using them to do your “underwater” effects.

Continue reading Getting world-space coordinates of screen fragments in glsl

A tale of two vectors (normal reconstruction and driver differences)

If you’re playing around with deferred rendering or post-process techniques, you’ve probably come across the concept that you can recover camera-space surface normals from camera space position like so:

vec3 reconstructCameraSpaceFaceNormal(vec3 CameraSpacePosition) {
    vec3 res = normalize(cross(dFdy(CameraSpacePosition), dFdx(CameraSpacePosition)));
    return res;

where C is the camera-space position.

What you might not realise is that you’re accidentally setting yourself up for confusion depending on your graphics driver. For the longest time, I was using this technique to try to implement SSAO without having to bother with storing screen space normals. After fiddling about a bit I noticed that on my desktop with an NVIDIA GTX680 everything looked OK, while on my laptop with intel HD integrated graphics everything looked inverted. I then tried reversing the normal I was getting out of this function. Success! The laptop is now displaying correctly. Failure! The desktop is now screwed up.

Continue reading A tale of two vectors (normal reconstruction and driver differences)

Simple billboard orientation in world space

I had a great deal of difficulty in working out how to do billboards properly, simply because I don’t have the mental agility to handle the descriptions usually given in 3D programming tutorials. The method I’ve managed to figure out is related to gluLookAt: and uses a similar method to work out the rotation matrix needed to map one vector to another.

Here’s how the process works:
Continue reading Simple billboard orientation in world space


More LOD than you can shake a STICK at
Procedurally generated terrain

I think I’ve come up with a new definition for “optimisation” in the context of writing a 3D engine, where it means “to painstakingly claw back some of your frame budget, and then immediately blow it on a new engine feature”. Hence the above looks pretty but currently runs at 10 frames per second on a SNB Core i5 MacBook Air.

New features include LOD-heavy instanced grass rendering, deferred lighting, cloud shadows and BLOOOOOOOOOOOM. Once I’ve “optimised” those features as well, I’ll do some write-ups – the instanced rendering method using matrix buffers which I picked up here is particularly cool.

Fast Enumeration, Massive Inefficiency

Cocoa makes use of a system called Fast Enumeration, which allows you to write array loops in a compact fashion and is also supposed to speed things up by getting [myArray nextObject] rather than getting [myArray objectAtIndex:nextIndex] on each pass through the loop. Here’s an example:

for (NSObject *anObject in myArrayOfObjects) {
    //do stuff

Because I’m an idiot, I had been using Fast Enumeration in situations where I needed to know the index of each object because I couldn’t be bothered to write out the code for incrementing an index. This resulted in incredibly readable stuff like the excrescence below:

for (NSObject *anObject in myArrayOfObjects) {
    if ([myArrayOfObjects indexOfObject:anObject] < ([myArrayOfObjects count] - 1) {
        NSObject *theNextObject = [myArrayOfObjects objectAtIndex:([myArrayOfObjects indexOfObject:anObject] + 1)];
        [self doSomethingWithObject:anObject andNextObject:theNextObject];

In case your eyes are bleeding too much for you to be able to see how hard I’ve made things for myself, here’s a rundown of what the above code actually does:
Continue reading Fast Enumeration, Massive Inefficiency

Dealing with .csv files in Cocoa – writing an importer

Mac users who need to deal with large amounts of data might justifiably feel like they got the short end of the stick as far as Microsoft Office is concerned. The Mac version of Excel doesn’t support Visual Basic and isn’t multithreaded, meaning that:

a) as soon as you get above a couple of tens of thousands of rows, any formula more complicated than summing a column freezes the UI for a couple of minutes while it crunches the numbers, and

b) any data analysis you want to do which involves iteration results in formulae consisting of ten lines of densely nested brackets, which are nearly impossible to read or debug.

Like any bad programmer, I implicitly believe that my language of choice is the perfect tool for any job, and hence when recently confronted with a very large stack of data I needed to analyse I decided it would be easiest to Object Oriented the hell out of it with a small custom C application.

Continue reading Dealing with .csv files in Cocoa – writing an importer

NSDocument saving quirks

Let’s say you have a document-based application which worked fine under Leopard/Snow Leopard.  Each document is backed by an XML store, and hence the saving method works by exporting the contents of a number of NSTextViews into one string of XML, which is saved to disk.  You’ve been happily overriding

- (BOOL)saveToURL:(NSURL *)url ofType:(NSString *)typeName forSaveOperation:(NSSaveOperationType)saveOperation error:(NSError **)outError

as being a sensible point to insert your custom document-saving code – in my case, I send the NSString which holds all of the document’s data to a basic XML exporter, which does clever stuff like removing all of the illegal characters, etc.  You then use NSString’s writeToURL: atomically:encoding:error method to do the actual write.  This works fine pre-Lion.

Everything goes swimmingly until you upgrade to Lion/Mountain Lion and try to save the document in place (i.e. save rather than save as:).  Your application pops up a warning sheet saying “This document’s file has been changed by another application since you opened or saved it.

Every.  Single.  Time.

Workaround: give your application a file wrapper so that you can add some metadata, and you can trick your application into realising that the file hasn’t been altered after all. You can do this by overriding NSDocument’s fileWrapperOfType: method rather than saveToURL. This is from an application for writing questions and answers to an XML file which is then used as the data source for a quiz application, hence the funny QuestionExporter/setQuizDocument object and setter:

- (NSFileWrapper *)fileWrapperOfType:(NSString *)typeName error:(NSError *__autoreleasing *)outError {
    QuestionExporter *exporter = [[QuestionExporter alloc]init];
    [exporter setQuizDocument:self];
    NSString *xmlString = [exporter exportQuestionsToString];
    NSFileWrapper *wrapper = [[NSFileWrapper alloc]initRegularFileWithContents:[xmlString dataUsingEncoding:NSUTF8StringEncoding]];
    return wrapper;

Lies, damn lies and GL_MAX_TEXTURE_UNITS

Warning: this post contains much bitching, only some of which is substantiated, and much of which probably only applies to Intel integrated graphics

So, I guess you could probably point out that I’m being a bit melodramatic and that, essentially, anybody who tries to do much in the way of multitexturing using integrated graphics gets what they deserve.

However, you may find it useful to know that, despite OpenGL telling you that you have 48 texture units available, don’t, under any circumstances, try to actually use all of them. In fact, you’re playing fast-and-loose if you even try to use some. It might seem logical to you, as an OpenGL novice, to write your code so that each texture unit is only used in one of your shaders and is reserved for a particular texture function; say, I have a shader for drawing grass, to I bind my grass texture to GL_TEXTURE23, set my sampler uniform to use that texture unit, and call it a day.

Don’t do that.

In my testing, again on pretty limited integrated hardware, I halved my drawing time by using a total of less than 8 texture units and binding textures as required. This includes the fact that I use GL_TEXTURE0 both for a material’s main texture in the first pass, and for doing post-processing on the entire framebuffer in a later pass.

In short – “fewer texture units used” trumps “fewer texture binds” every time, when using limited hardware.

Branching? What branching?

Apple’s implementation of GLSL seems to suffer from a frequent problem in 3D programming: all of the features you can use to optimise your code work well on powerful graphics hardware and actually slow things down on a less powerful GPU. This is exacerbated by the prevalence of Intel HD hardware in Apple machines. Full disclosure; I use both a 2010 Mac Pro with an NVidia Geforce 680 GTX and a MacBook air with Intel graphics HD3000. My multi-pass renderer does cascading shadow maps, bump mapping, GPU-based water animation, multi-textured landscape relief mapping, and screen-space sun rays and depth of field, all of which uses up a fair amount of fragment power. It’s pretty obvious that this absolutely kills performance on the Intel graphics hardware, so I implemented a system of uniforms to turn off features of the renderer in the vertex and fragment shaders on hardware which can’t handle it. Simple, yes?


On the NVidia hardware, putting a branch into the fragment shader by using a boolean uniform seems to work fine – although performance on a GTX 680 is so ridiculous that I probably wouldn’t notice the slowdown anyway. However, on the Intel hardware, the ray-casting code which does the relief mapping slows things down for every single fragment regardless of whether that code path should have been turned off. Googling turns up a bunch of forum references which imply that the GPU is actually running both code paths and then displaying the result of the simpler one, which causes performance to be utterly dreadful.

For example, in this situation:

uniform bool myGPUIsAPieceOfShit;
void main () {
     if (myGPUIsAPieceOfShit) {
     else {

You are going to end up with terrible performance. This also puts paid to the idea of, say, having a shader which can optionally do bump mapping, depending on a uniform. You are, instead, going to end up spending the fragment power and then not actually seeing a result.

As it stands, if you find that commenting out one of the code paths causes you to triple the frame rate, you’re going to need to write separate shaders for each path and choose the one appropriate to the hardware.