On GLSL subroutines in Mac OS X and disappointment

I probably don’t need to tell anybody that the state of 3D graphics is somewhat sad under OS X when compared with Windows. This isn’t really due to differences between the OpenGL API used by MacOS and DirectX, as used by Windows; even OpenGL-only Windows applications typically run significantly better than their MacOS counterparts. It’s more easily attributable to two other factors:

  1. There are far more computers running Windows than MacOS; these computers are more likely to be running intensive 3D applications (i.e. video games)
  2. Apple is notoriously disinterested in games, and notoriously tardy in keeping up with new OpenGL specifications.

This means that a) it takes a while for any new OpenGL features to make it to the Mac platform and b) they suck when they finally arrive.

As of Mavericks, the OpenGL Core profile has been upgraded to 4.0, and GLSL version 400 is supported. This means that shader subroutines have become available. Theoretically, shader subroutines are a really neat feature. Because of the way that graphics cards work, conditionals and branching in shader code incur a large performance penalty. Similarly, dynamic loops are much less efficient than loops of a fixed length, even though more recent graphics APIs claim to have fixed that. What this means is that if you have a shader which choses whether to do a cheap operation, or a more expensive operation, then it will perform worse then either (usually because it’s doing both). If that shader then choses how many times to do the expensive operation, the performance gets even worse despite the fact that it should theoretically be avoiding unnecessary iterations through the loop. This means that the best option has always been to write two different shaders for the simple and the complex operation, and not to bother dynamically limiting the number of iterations in a loop, but just hard code the smallest number you think you can get away with.

Shader subroutines were supposed to fix the first of these problems; it was supposed to be possible to write “modular” shaders, where a uniform allows you to change which operations a shader uses. In my renderer, which is admittedly poorly optimised, I would like to chose between a parallax-mapped, self-shadowing shader (expensive) or a simpler version which uses vertex normals – in this specific case, the simpler version is for drawing reflections, which don’t require the same level of detail. Here’s the results (Nvidia GTX680, MacOS X 10.9.4, similar results using both default Apple drivers and Nvidia web drivers):

  • No subroutines – both main render and reflection render use expensive shader: frame time approx. 0.028s
  • Subroutines coded in the shader, and uniforms set, but subroutines never actually called: frame time approx. 0.031s
  • Subroutines in use, cheaper subroutine used for drawing reflections: frame time approx. 0.035s

Vertex normals should be really cheap, and save a lot of performance when compared with parallax mapping everything. In addition, I’m not mixing and matching different subroutines – one subroutine is used for each pass, so switching only occurs once per frame. The problem is, the mere existence of code indicating the presence of a subroutine incurs a significant performance hit, which is actually more expensive than just giving up and using the much more complicated and expensive shader for everything.

So, yeah; disappointing.

Leave a Reply

Your email address will not be published. Required fields are marked *