Category Archives: GLSL

Simple billboard orientation in world space

I had a great deal of difficulty in working out how to do billboards properly, simply because I don’t have the mental agility to handle the descriptions usually given in 3D programming tutorials. The method I’ve managed to figure out is related to gluLookAt: and uses a similar method to work out the rotation matrix needed to map one vector to another.

Here’s how the process works:
Continue reading Simple billboard orientation in world space


More LOD than you can shake a STICK at
Procedurally generated terrain

I think I’ve come up with a new definition for “optimisation” in the context of writing a 3D engine, where it means “to painstakingly claw back some of your frame budget, and then immediately blow it on a new engine feature”. Hence the above looks pretty but currently runs at 10 frames per second on a SNB Core i5 MacBook Air.

New features include LOD-heavy instanced grass rendering, deferred lighting, cloud shadows and BLOOOOOOOOOOOM. Once I’ve “optimised” those features as well, I’ll do some write-ups – the instanced rendering method using matrix buffers which I picked up here is particularly cool.

Branching? What branching?

Apple’s implementation of GLSL seems to suffer from a frequent problem in 3D programming: all of the features you can use to optimise your code work well on powerful graphics hardware and actually slow things down on a less powerful GPU. This is exacerbated by the prevalence of Intel HD hardware in Apple machines. Full disclosure; I use both a 2010 Mac Pro with an NVidia Geforce 680 GTX and a MacBook air with Intel graphics HD3000. My multi-pass renderer does cascading shadow maps, bump mapping, GPU-based water animation, multi-textured landscape relief mapping, and screen-space sun rays and depth of field, all of which uses up a fair amount of fragment power. It’s pretty obvious that this absolutely kills performance on the Intel graphics hardware, so I implemented a system of uniforms to turn off features of the renderer in the vertex and fragment shaders on hardware which can’t handle it. Simple, yes?


On the NVidia hardware, putting a branch into the fragment shader by using a boolean uniform seems to work fine – although performance on a GTX 680 is so ridiculous that I probably wouldn’t notice the slowdown anyway. However, on the Intel hardware, the ray-casting code which does the relief mapping slows things down for every single fragment regardless of whether that code path should have been turned off. Googling turns up a bunch of forum references which imply that the GPU is actually running both code paths and then displaying the result of the simpler one, which causes performance to be utterly dreadful.

For example, in this situation:

uniform bool myGPUIsAPieceOfShit;
void main () {
     if (myGPUIsAPieceOfShit) {
     else {

You are going to end up with terrible performance. This also puts paid to the idea of, say, having a shader which can optionally do bump mapping, depending on a uniform. You are, instead, going to end up spending the fragment power and then not actually seeing a result.

As it stands, if you find that commenting out one of the code paths causes you to triple the frame rate, you’re going to need to write separate shaders for each path and choose the one appropriate to the hardware.