Category Archives: OS X

Lies, damn lies and GL_MAX_TEXTURE_UNITS

Warning: this post contains much bitching, only some of which is substantiated, and much of which probably only applies to Intel integrated graphics

So, I guess you could probably point out that I’m being a bit melodramatic and that, essentially, anybody who tries to do much in the way of multitexturing using integrated graphics gets what they deserve.

However, you may find it useful to know that, despite OpenGL telling you that you have 48 texture units available, don’t, under any circumstances, try to actually use all of them. In fact, you’re playing fast-and-loose if you even try to use some. It might seem logical to you, as an OpenGL novice, to write your code so that each texture unit is only used in one of your shaders and is reserved for a particular texture function; say, I have a shader for drawing grass, to I bind my grass texture to GL_TEXTURE23, set my sampler uniform to use that texture unit, and call it a day.

Don’t do that.

In my testing, again on pretty limited integrated hardware, I halved my drawing time by using a total of less than 8 texture units and binding textures as required. This includes the fact that I use GL_TEXTURE0 both for a material’s main texture in the first pass, and for doing post-processing on the entire framebuffer in a later pass.

In short – “fewer texture units used” trumps “fewer texture binds” every time, when using limited hardware.

Branching? What branching?

Apple’s implementation of GLSL seems to suffer from a frequent problem in 3D programming: all of the features you can use to optimise your code work well on powerful graphics hardware and actually slow things down on a less powerful GPU. This is exacerbated by the prevalence of Intel HD hardware in Apple machines. Full disclosure; I use both a 2010 Mac Pro with an NVidia Geforce 680 GTX and a MacBook air with Intel graphics HD3000. My multi-pass renderer does cascading shadow maps, bump mapping, GPU-based water animation, multi-textured landscape relief mapping, and screen-space sun rays and depth of field, all of which uses up a fair amount of fragment power. It’s pretty obvious that this absolutely kills performance on the Intel graphics hardware, so I implemented a system of uniforms to turn off features of the renderer in the vertex and fragment shaders on hardware which can’t handle it. Simple, yes?


On the NVidia hardware, putting a branch into the fragment shader by using a boolean uniform seems to work fine – although performance on a GTX 680 is so ridiculous that I probably wouldn’t notice the slowdown anyway. However, on the Intel hardware, the ray-casting code which does the relief mapping slows things down for every single fragment regardless of whether that code path should have been turned off. Googling turns up a bunch of forum references which imply that the GPU is actually running both code paths and then displaying the result of the simpler one, which causes performance to be utterly dreadful.

For example, in this situation:

uniform bool myGPUIsAPieceOfShit;
void main () {
     if (myGPUIsAPieceOfShit) {
     else {

You are going to end up with terrible performance. This also puts paid to the idea of, say, having a shader which can optionally do bump mapping, depending on a uniform. You are, instead, going to end up spending the fragment power and then not actually seeing a result.

As it stands, if you find that commenting out one of the code paths causes you to triple the frame rate, you’re going to need to write separate shaders for each path and choose the one appropriate to the hardware.

Sorting objects into an array of NSArrays

This one is pretty simple, but I had a terrible amount of difficulty working out how to do this the first time.  Cocoa makes a lot of this process pretty easy by providing the NSMutableArray class – the only real gotcha with an NSMutableArray is the “object was mutated while being enumerated” problem.  This means exactly what it says – you’ve removed an object in the array while enumerating (going through the objects one by one) the same array.  The usual workaround is to create intermediate arrays with lists of objects to delete and then enumerate over the intermediate and commit changes to your NSMutableArray.

NSArrays can only contain objects; if you want to store ints, floats etc. then you are going to need to encapsulate them in an NSObject of some sort (e.g. in this case, an NSNumber would be a suitable fit).  All of the objects in an NSArray need to be of the same class.

Say I want to make a very simple hierarchical data store which will provide the data for part of the user interface for an app; for example, a UITableView.  I want my tableView to have sections which are organised by date – you can see this sort of arrangement in, for example, email programs which sort your incoming mail by Today, Yesterday, Last Week, etc.  One simple way to do this is to establish a hierarchical structure of arrays, like so:

Master array       –> section array –> content object

–> section array (empty)

–> section array –> content object

–> content object
–> content object

Continue reading Sorting objects into an array of NSArrays