Tag Archives: intel graphics

A tale of two vectors (normal reconstruction and driver differences)

If you’re playing around with deferred rendering or post-process techniques, you’ve probably come across the concept that you can recover camera-space surface normals from camera space position like so:

vec3 reconstructCameraSpaceFaceNormal(vec3 CameraSpacePosition) {
    vec3 res = normalize(cross(dFdy(CameraSpacePosition), dFdx(CameraSpacePosition)));
    return res;
}

where C is the camera-space position.

What you might not realise is that you’re accidentally setting yourself up for confusion depending on your graphics driver. For the longest time, I was using this technique to try to implement SSAO without having to bother with storing screen space normals. After fiddling about a bit I noticed that on my desktop with an NVIDIA GTX680 everything looked OK, while on my laptop with intel HD integrated graphics everything looked inverted. I then tried reversing the normal I was getting out of this function. Success! The laptop is now displaying correctly. Failure! The desktop is now screwed up.

Continue reading A tale of two vectors (normal reconstruction and driver differences)

Branching? What branching?

Apple’s implementation of GLSL seems to suffer from a frequent problem in 3D programming: all of the features you can use to optimise your code work well on powerful graphics hardware and actually slow things down on a less powerful GPU. This is exacerbated by the prevalence of Intel HD hardware in Apple machines. Full disclosure; I use both a 2010 Mac Pro with an NVidia Geforce 680 GTX and a MacBook air with Intel graphics HD3000. My multi-pass renderer does cascading shadow maps, bump mapping, GPU-based water animation, multi-textured landscape relief mapping, and screen-space sun rays and depth of field, all of which uses up a fair amount of fragment power. It’s pretty obvious that this absolutely kills performance on the Intel graphics hardware, so I implemented a system of uniforms to turn off features of the renderer in the vertex and fragment shaders on hardware which can’t handle it. Simple, yes?

No.

On the NVidia hardware, putting a branch into the fragment shader by using a boolean uniform seems to work fine – although performance on a GTX 680 is so ridiculous that I probably wouldn’t notice the slowdown anyway. However, on the Intel hardware, the ray-casting code which does the relief mapping slows things down for every single fragment regardless of whether that code path should have been turned off. Googling turns up a bunch of forum references which imply that the GPU is actually running both code paths and then displaying the result of the simpler one, which causes performance to be utterly dreadful.

For example, in this situation:

uniform bool myGPUIsAPieceOfShit;
 
void main () {
     if (myGPUIsAPieceOfShit) {
          doSomethingSimpleButFast;
     }
 
     else {
          doSomethingPrettyButSlow;
     }
}

You are going to end up with terrible performance. This also puts paid to the idea of, say, having a shader which can optionally do bump mapping, depending on a uniform. You are, instead, going to end up spending the fragment power and then not actually seeing a result.

As it stands, if you find that commenting out one of the code paths causes you to triple the frame rate, you’re going to need to write separate shaders for each path and choose the one appropriate to the hardware.