Tag Archives: glsl

Branching? What branching?

Apple’s implementation of GLSL seems to suffer from a frequent problem in 3D programming: all of the features you can use to optimise your code work well on powerful graphics hardware and actually slow things down on a less powerful GPU. This is exacerbated by the prevalence of Intel HD hardware in Apple machines. Full disclosure; I use both a 2010 Mac Pro with an NVidia Geforce 680 GTX and a MacBook air with Intel graphics HD3000. My multi-pass renderer does cascading shadow maps, bump mapping, GPU-based water animation, multi-textured landscape relief mapping, and screen-space sun rays and depth of field, all of which uses up a fair amount of fragment power. It’s pretty obvious that this absolutely kills performance on the Intel graphics hardware, so I implemented a system of uniforms to turn off features of the renderer in the vertex and fragment shaders on hardware which can’t handle it. Simple, yes?


On the NVidia hardware, putting a branch into the fragment shader by using a boolean uniform seems to work fine – although performance on a GTX 680 is so ridiculous that I probably wouldn’t notice the slowdown anyway. However, on the Intel hardware, the ray-casting code which does the relief mapping slows things down for every single fragment regardless of whether that code path should have been turned off. Googling turns up a bunch of forum references which imply that the GPU is actually running both code paths and then displaying the result of the simpler one, which causes performance to be utterly dreadful.

For example, in this situation:

uniform bool myGPUIsAPieceOfShit;
void main () {
     if (myGPUIsAPieceOfShit) {
     else {

You are going to end up with terrible performance. This also puts paid to the idea of, say, having a shader which can optionally do bump mapping, depending on a uniform. You are, instead, going to end up spending the fragment power and then not actually seeing a result.

As it stands, if you find that commenting out one of the code paths causes you to triple the frame rate, you’re going to need to write separate shaders for each path and choose the one appropriate to the hardware.