Dynamic depth of field on the GPU – Part 2 of n

Thus far, we’ve discussed the reasons for wanting to do depth of field (DoF) on the GPU. We’ve figured out that we need to get some idea of what our focal length should be, and to figure out by how much the depth of each fragment in our scene render differs from that focal length. All of this information is available in the depth buffer we generated when we rendered the scene’s geometry; if we rendered into a framebuffer object with a depth attachment, this means we have the information available on the GPU in the form of a texture.

Getting data back off the GPU is a pain. It can be accomplished effectively with pixel buffer objects, but the available bandwidth between CPU and GPU is comparatively tiny and we don’t really want to take any up if we can help it. Additionally, we don’t want either the CPU or the GPU stalled while waiting for each other’s contribution, because that’s inefficient. It’s therefore more logical to analyse the depth buffer on the GPU using a shader. As a bonus, you can do this at the same time as you’re linearising the depth buffer for other post-processing operations – for example, you might need a linear depth buffer to add fog to your scene or for SSAO.

What we’re going to do is generate a representation of how each fragment’s depth differs from the focal length, the results of which will look something like this:

Here’s how to do it:

Step 2: Getting focal length

This concept is very similar to, and dependent on, the linear depth shader we discussed previously. The technique is basically the same, and the two can be combined in the same shader for efficiency. We’re going to render a full-screen quad, using a pass-through vertex shader. Here’s some example code we need to create a vertex array object containing a two-triangle quad with the correct properties (you will need to use your own vector class):

float quadVertexArrayObject;
 
typedef struct {
    Vector4 position;
    Vector4 normal;
    Vector4 textureCoordinate;
} Vertex;
 
void loadVertexArrayObject(Vertex vertexData[], GLushort indices[], int numberOfVertices, int numberOfIndices, GLint positionAttribute, GLint textureCoordinateAttribute, GLint normalAttribute) {
 
    GLuint vertexBuffer;
    glGenBuffers(1, &vertexBuffer);
    glBindBuffer(GL_ARRAY_BUFFER, vertexBuffer);
    glBufferData(GL_ARRAY_BUFFER, numberOfVertices * sizeof(Vertex), vertexData, GL_STATIC_DRAW);
 
    if (numberOfIndices >= 65536) {
        NSLog(@"Quart into pint pot error - this index array needs to be Uints, not Ubytes");
    }
 
    GLuint indexBuffer;
    glGenBuffers(1, &indexBuffer);
    glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, indexBuffer);
    glBufferData(GL_ELEMENT_ARRAY_BUFFER, numberOfIndices * sizeof(GLushort), indices, GL_STATIC_DRAW);
 
    glEnableVertexAttribArray((GLuint)0);
    glEnableVertexAttribArray((GLuint)1);
    glEnableVertexAttribArray((GLuint)2);
    glVertexAttribPointer((GLuint)0, 4, GL_FLOAT, GL_FALSE, sizeof(Vertex), (const GLvoid *)offsetof(Vertex, position));
    glVertexAttribPointer((GLuint)1, 4, GL_FLOAT, GL_FALSE, sizeof(Vertex), (const GLvoid *)offsetof(Vertex, textureCoordinate));
    glVertexAttribPointer((GLuint)2, 4, GL_FLOAT, GL_FALSE, sizeof(Vertex), (const GLvoid *)offsetof(Vertex, normal));
}
 
void makeAQuadForPassthroughShader {
    //two-triangle quad
    glGenVertexArrays(1, &quadVertexArrayObject);
    glBindVertexArray(quadVertexArrayObject);
 
    Vertex texVertexData[4] = {
        { .position = Vector4Make(-1.0, -1.0, 0.0, 1.0), .textureCoordinate = Vector4Make(0.0, 0.0, 0.0, 1.0) , .normal = Vector4Make(0.0, 0.0, 1.0, 0.0)},
        { .position = Vector4Make(-1.0, 1.0, 0.0, 1.0), .textureCoordinate = Vector4Make(0.0, 1.0, 0.0, 1.0) , .normal = Vector4Make(0.0, 0.0, 1.0, 0.0)},
        { .position = Vector4Make(1.0, 1.0, 0.0, 1.0), .textureCoordinate = Vector4Make(1.0, 1.0, 0.0, 1.0), .normal = Vector4Make(0.0, 0.0, 1.0, 0.0)},
        { .position = Vector4Make(1.0, -1.0, 0.0, 1.0), .textureCoordinate = Vector4Make(1.0, 0.0, 0.0, 1.0), .normal = Vector4Make(0.0, 0.0, 1.0, 0.0)}
    };
 
    GLushort twoTriIndices[6] = {0, 2, 1, 3, 2, 0};
    loadVertexArrayObject(texVertexData, twoTriIndices, 4, 6, 0, 1, 2);
}

Note that the quad is in the x and y axes, and that while position coordinates go from (-1.0, -1.0) to (1.0, 1.0), the texture coordinates this quad will supply to our shader only go from (0.0, 0.0) to (1.0, 1.0). This means that, if we render our quad with no additional transformations, we will draw to the entire viewport, and the texture coordinates given allow us to map our previously rendered textures without further transformations.

Here’s another useful tip: texture lookups are much more efficient if the coordinates have been pre-calculated in the vertex shader. Because the fragment shader is going to be dealing with millions of fragments, and the vertex shader is only going to run four times, it’s probably more efficient to sample the points and work out a mean focal length in the vertex shader, rather than the fragment shader.

What we’re going to do is sample the centre point and two concentric rings of eight points each. I should probably have weighted them according to a gaussian distribution but I haven’t actually bothered. You could sample more points, or fewer. I’ve set the radius of each ring using a couple of floats; each point is scaled by that radius and then (0.5, 0.5) is added so that the positions are relative to the centre of the screen. Here’s the vertex shader:

#version 330
 
layout (location = 0) in vec4 position;
layout (location = 1) in vec4 textureCoordinate;
layout  (location = 2)  in vec4 normal;
 
uniform sampler2D depthTexture;
uniform float near_clip;
uniform float far_clip;
 
out vec2 texCoord;
 
const float central_vision_radius = 0.05;
const float central_vision_outer_radius = 0.1;
 
const vec2 circleVectorsAndTheCentre[17] = vec2[17](
     vec2(0.0, 0.0) * central_vision_radius + vec2(0.5, 0.5),
     vec2(0.0, 1.0) * central_vision_radius + vec2(0.5, 0.5),
     vec2(0.707, 0.707) * central_vision_radius + vec2(0.5, 0.5),
     vec2(1.0, 0.0) * central_vision_radius + vec2(0.5, 0.5),
     vec2(0.707, -0.707) * central_vision_radius + vec2(0.5, 0.5),
     vec2(0.0, -1.0) * central_vision_radius + vec2(0.5, 0.5),
     vec2(-0.707, -0.707) * central_vision_radius + vec2(0.5, 0.5),
     vec2 (-1.0, 0.0) * central_vision_radius + vec2(0.5, 0.5),
     vec2(-0.707, 0.707) * central_vision_radius + vec2(0.5, 0.5),
     vec2(0.0, 1.0) * central_vision_outer_radius + vec2(0.5, 0.5),
     vec2(0.707, 0.707) * central_vision_outer_radius + vec2(0.5, 0.5),
     vec2(1.0, 0.0) * central_vision_outer_radius + vec2(0.5, 0.5),
     vec2(0.707, -0.707) * central_vision_outer_radius + vec2(0.5, 0.5),
     vec2(0.0, -1.0) * central_vision_outer_radius + vec2(0.5, 0.5),
     vec2(-0.707, -0.707) * central_vision_outer_radius + vec2(0.5, 0.5),
     vec2(-1.0, 0.0) * central_vision_outer_radius + vec2(0.5, 0.5),
     vec2(-0.707, 0.707) * central_vision_outer_radius + vec2(0.5, 0.5)
     );
 
out float meanFocalLength;
 
void main()
{
	gl_Position = position;
    texCoord = vec2(textureCoordinate.xy);
 
    float totalDepthSample = 0.0;
 
    for (int i = 0; i < 17; i ++) {
        float z = texture(depthTexture, circleVectorsAndTheCentre[i]).x;
        float z_n = 2.0 * z - 1.0;
        float z_e = 2.0 * near_clip * far_clip / (far_clip + near_clip - z_n * (far_clip - near_clip));
        float linear_z = z_e/far_clip;
        //clamped to reduce the weighting from a focal length of infinity
        totalDepthSample += clamp(linear_z, 0.0, 0.95);
    }
 
    float meanDepthSample = totalDepthSample / 17.0;
 
    meanFocalLength = meanDepthSample;
}

Here’s a fragment shader which takes the mean of the sampled points and compares every other fragment with them. I am using the absolute difference, because I don’t care whether a fragment is in front of or behind the focal length.

#version 330
 
uniform sampler2D depthTexture;
uniform float near_clip;
uniform float far_clip;
 
in vec2 texCoord;
in float meanFocalLength;
 
out vec4 fragColour;
 
float differenceMultiplier = 2.0;
 
void main()
{
    float z = texture(depthTexture, texCoord).x;
    float z_n = 2.0 * z - 1.0;
    float z_e = 2.0 * near_clip * far_clip / (far_clip + near_clip - z_n * (far_clip - near_clip));
    float linear_z = z_e/far_clip;
 
    float this_fragment_relative_depth = abs(meanFocalLength - linear_z) * differenceMultiplier;
    fragColour = vec4(this_fragment_relative_depth, this_fragment_relative_depth, this_fragment_relative_depth, 1.0);
}

“differenceMultiplier” is a way of tuning the effect. You can change the depth of the DoF effect by modifying this_fragment_relative depth – you can multiply it to get a stronger effect, you could take the square or the square root to change the way the distance falloff looks.

Okay, I think that’s enough for now. Next time we’ll have a look at how to use this information to selectively blur our scene to make our DoF effect.

One thought on “Dynamic depth of field on the GPU – Part 2 of n

Leave a Reply

Your email address will not be published. Required fields are marked *