Category Archives: OpenGL


More LOD than you can shake a STICK at
Procedurally generated terrain

I think I’ve come up with a new definition for “optimisation” in the context of writing a 3D engine, where it means “to painstakingly claw back some of your frame budget, and then immediately blow it on a new engine feature”. Hence the above looks pretty but currently runs at 10 frames per second on a SNB Core i5 MacBook Air.

New features include LOD-heavy instanced grass rendering, deferred lighting, cloud shadows and BLOOOOOOOOOOOM. Once I’ve “optimised” those features as well, I’ll do some write-ups – the instanced rendering method using matrix buffers which I picked up here is particularly cool.

Lies, damn lies and GL_MAX_TEXTURE_UNITS

Warning: this post contains much bitching, only some of which is substantiated, and much of which probably only applies to Intel integrated graphics

So, I guess you could probably point out that I’m being a bit melodramatic and that, essentially, anybody who tries to do much in the way of multitexturing using integrated graphics gets what they deserve.

However, you may find it useful to know that, despite OpenGL telling you that you have 48 texture units available, don’t, under any circumstances, try to actually use all of them. In fact, you’re playing fast-and-loose if you even try to use some. It might seem logical to you, as an OpenGL novice, to write your code so that each texture unit is only used in one of your shaders and is reserved for a particular texture function; say, I have a shader for drawing grass, to I bind my grass texture to GL_TEXTURE23, set my sampler uniform to use that texture unit, and call it a day.

Don’t do that.

In my testing, again on pretty limited integrated hardware, I halved my drawing time by using a total of less than 8 texture units and binding textures as required. This includes the fact that I use GL_TEXTURE0 both for a material’s main texture in the first pass, and for doing post-processing on the entire framebuffer in a later pass.

In short – “fewer texture units used” trumps “fewer texture binds” every time, when using limited hardware.

Branching? What branching?

Apple’s implementation of GLSL seems to suffer from a frequent problem in 3D programming: all of the features you can use to optimise your code work well on powerful graphics hardware and actually slow things down on a less powerful GPU. This is exacerbated by the prevalence of Intel HD hardware in Apple machines. Full disclosure; I use both a 2010 Mac Pro with an NVidia Geforce 680 GTX and a MacBook air with Intel graphics HD3000. My multi-pass renderer does cascading shadow maps, bump mapping, GPU-based water animation, multi-textured landscape relief mapping, and screen-space sun rays and depth of field, all of which uses up a fair amount of fragment power. It’s pretty obvious that this absolutely kills performance on the Intel graphics hardware, so I implemented a system of uniforms to turn off features of the renderer in the vertex and fragment shaders on hardware which can’t handle it. Simple, yes?


On the NVidia hardware, putting a branch into the fragment shader by using a boolean uniform seems to work fine – although performance on a GTX 680 is so ridiculous that I probably wouldn’t notice the slowdown anyway. However, on the Intel hardware, the ray-casting code which does the relief mapping slows things down for every single fragment regardless of whether that code path should have been turned off. Googling turns up a bunch of forum references which imply that the GPU is actually running both code paths and then displaying the result of the simpler one, which causes performance to be utterly dreadful.

For example, in this situation:

uniform bool myGPUIsAPieceOfShit;
void main () {
     if (myGPUIsAPieceOfShit) {
     else {

You are going to end up with terrible performance. This also puts paid to the idea of, say, having a shader which can optionally do bump mapping, depending on a uniform. You are, instead, going to end up spending the fragment power and then not actually seeing a result.

As it stands, if you find that commenting out one of the code paths causes you to triple the frame rate, you’re going to need to write separate shaders for each path and choose the one appropriate to the hardware.

Two dimensional C arrays – care and feeding

C arrays are a little hard to grasp by those of us raised on the Cocoa API, because most of the really convenient stuff (like an array object being able to keep track of its own size) just isn’t there in straight C.  I went into using C arrays, in order to interact with OpenGL, with absolutely no knowledge whatsoever (a common theme of this site!) of how to even use malloc.

Why would you want to use two-dimensional arrays anyway?  In my case, although I’ve written a fair few functions where e.g. a two-dimensional grid of points is mapped to a long one-dimensional array, and I’ve simply remembered what the dimensions of the grid were when accessing the information, sometimes this is simply too much for my poor brain to understand. In this situation I’m forced to represent things as a multidimensional array in order to get the concepts down in a way which works.  I might, therefore, have one array which represents all of the values of the x dimension, containing a series of arrays which represent the y dimension; so the point in the grid at (5, 13) is retrieved by getting the 13th value from the 5th subarray in the parent array.

Fortunately there are people who are much cleverer than me who have come up with a way of handling two-dimensional C-arrays.  Unfortunately I have forgotten the attribution.  However, I had to go forum-diving to find this and I think that it might be helpful for us inexperienced types to have a more easily-searched solution.  If I find the original source again, I will provide a link.

One similar implementation was found here

//make a 2d array of floats
float** Make2DFloatArray(long lengthMainArray, long lengthSubArray) {
    float **newArray = (float **) malloc(lengthMainArray * sizeof(float *));
    *newArray = malloc(lengthMainArray * lengthSubArray * sizeof(float));
    for (int i = 0; i < lengthMainArray; i++) {
        newArray[i] = *newArray + i * lengthSubArray;
    return newArray;
//release a 2d array of floats
void free2DFloatArray (float **arrayToFree) {
//access a value in the 2d array
float theFloatIWant = my2DFloatArray[indexInMainArray][indexInSubArray];