How I Learned to Stop Blitting and Love the Framebuffer

A deferred rendering pipeline provides great opportunities to run post processing shaders on the results from your G-buffer (A G-buffer is the target for the first pass in a deferred renderer, usually consisting of colour, normal, depth ± material textures, which stores the information needed for subsequent lighting and effects passes). There are a couple of issues, however:

1) OpenGL can’t both read from and write to a texture at the same time; attempting to do so will give an undefined result (i.e. per OpenGL tradition it will look like the result you wanted, except when it doesn’t)
2) What, therefore, do you do about transparency in a deferred renderer? You need the information about what’s behind the transparent object, and how far away it is. That’s in your G-buffer, which you’re already writing to (I’m assuming here that you’re rendering transparent materials last, which is really the only sane way to do it).

What you’re going to need is a copy of a subset of your G-buffer to provide the information you need to draw the stuff behind your transparent material. There’s an expensive way and a cheap way to do this; the expensive way is to do all of your deferred lighting before you render any transparent materials (which does make life easier in some ways, but means you need to do two lighting passes, one for opaque and one for transparent materials), the cheap way is to decide that you’re not going to bother to light the stuff behind the transparency because you’re already going to be throwing a bunch of refraction effects on top anyway and all you really need is a bit of detail to sell the effect.

Here’s where I tripped myself up, in the usual manner for a novice learning OpenGL from 10-year-old tutorials on the internet; I naïvely thought that the most logical thing to do at this point would be to blit (i.e. copy the pixels directly) from my G-buffer to another set of textures which I would then use as the source for rendering transparency. Duplicating a chunk of memory seemed like it was going to be a much faster operation than actually drawing anything. Because I’m not a total idiot, I did at least avoid using glCopyTexImage2D and went straight for the faster glCopyTexSubImage2D operation instead. The typical way you would use this is:

//bind the framebuffer you’re going to read from
glBindFrameBuffer(GL_FRAMEBUFFER, myGBuffer);
glViewport(0.0, 0.0, my_gBuffer_width, my_gBuffer_height);
//specify which framebuffer attachment you’re going to read
glReadBuffer(GL_COLOR_ATTACHMENT0); //or whichever
//bind the texture you’re going to copy to
glBindTexture(GL_TEXTURE_2D, myDuplicateTexture);
//do the copy operation
glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0,  my_gBuffer_width, my_gBuffer_height);

The problem with this is, the performance is terrible, especially if you’re copying across a PCI bus. I was copying a colour and a depth texture for a water effect, and a quick root around in the driver monitor led to the discovery that the GL was spending at least 70% of its time on those two copy operations alone. I’d imagine that this is likely a combination of copying into system memory for some reason and stalling the pipeline; to make matters worse, in this sort of situation you can only really copy the textures immediately before you need to use them, so unless you’re going to rig up some complicated double-buffer solution copying the last frame, asynchronous pixel buffer transfers aren’t going to save you. Using driver hints to keep the texture data in GPU memory might work, or it might not.

Here’s one solution:

OpenGL is really, really good at drawing triangles, so let’s play to our strengths and just draw the textures we want to copy. Set up a simple shader which will read the textures you want and draw a copy to a framebuffer containing the textures you want to copy them to; this even allows you to use multiple render targets to copy a bunch of textures simultaneously. The only real gotcha is that you don’t want to set this up with a depth renderbuffer as you would a normal G-buffer or, say, a shadow rendering buffer; instead, set the texture you want to copy depth information to up as a regular 32-bit float colour texture (or whatever matches your depth buffer format) and attach it to one of the colour attachment points.

Code is below; as usual, your milage can and will vary depending on the setup of your computer – for instance my desktop system (which is PCIe 2.0) benefitted, whereas it was pretty much a wash on my newer laptop

Here’s a sample framebuffer with attached colour and depth textures:

- (void)createDuplicateFramebufferAndTexture {
    glGenFramebuffers(1, &preTransparencyDuplicateFBOId);
    glBindFramebuffer(GL_FRAMEBUFFER, preTransparencyDuplicateFBOId);
    glGenTextures(1, &preTransparencyDuplicateColourTextureId);
    glBindTexture(GL_TEXTURE_2D, preTransparencyDuplicateColourTextureId);
    int frameWidth = self.theGLView.frame.size.width/MAIN_RENDER_BUFFER_SCALE;
    int frameHeight = self.theGLView.frame.size.height/MAIN_RENDER_BUFFER_SCALE;
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, frameWidth, frameHeight, 0, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, 0);
    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, preTransparencyDuplicateColourTextureId, 0);
    glGenTextures(1, &preTransparencyDepthBufferCopyId);
    glBindTexture(GL_TEXTURE_2D, preTransparencyDepthBufferCopyId);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, frameWidth, frameHeight, 0, GL_BGRA, GL_FLOAT, NULL);
    glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, preTransparencyDepthBufferCopyId, 0);
    glDrawBuffers(2, buffers);
    GLuint status = glCheckFramebufferStatus(GL_FRAMEBUFFER);
    switch (status) {
        case 0x8CDB:										printf("GL_FRAMEBUFFER_INCOMPLETE_DRAW_BUFFER_EXT\n");break;
            //default:											printf("Unknown issue (%x).\n",status);break;

Here’s the shaders which will do the copying:

//vertex shader
#version 330
layout (location = 0) in vec4 position;
layout (location = 1) in vec4 textureCoordinate;
layout  (location = 2)  in vec4 normal;
out vec2 texCoord;
void main()
    gl_Position = position;
    texCoord = vec2(textureCoordinate.xy);
//fragment shader
#version 330
uniform sampler2D colourTexture;
uniform sampler2D depthTexture;
in vec2 texCoord;
out vec4 fragColour, depthColour;
void main()
    fragColour = texture(colourTexture, texCoord);
    depthColour = texture(depthTexture, texCoord);

here’s the code for loading and setting up the copying shader.

#pragma mark g-buffer copying shader
    GLuint gBufferCopyVertexShader;
    GLuint gBufferCopyFragmentShader;
    gBufferCopyVertexShader = [self compileShaderOfType:GL_VERTEX_SHADER   file:[[NSBundle mainBundle] pathForResource:@"GBufferCopy" ofType:@"vert"]];
    gBufferCopyFragmentShader = [self compileShaderOfType:GL_FRAGMENT_SHADER file:[[NSBundle mainBundle] pathForResource:@"GBufferCopy" ofType:@"frag"]];
    if (0 != gBufferCopyVertexShader && 0 != gBufferCopyFragmentShader) {
        gBufferCopyShaderProgram = glCreateProgram();
        glAttachShader(gBufferCopyShaderProgram, gBufferCopyVertexShader);
        glAttachShader(gBufferCopyShaderProgram, gBufferCopyFragmentShader);
        glBindFragDataLocation(gBufferCopyShaderProgram, 0, "fragColour");
        glBindFragDataLocation(gBufferCopyShaderProgram, 1, "depthColour");
        [self linkProgram:gBufferCopyShaderProgram withName:@"gbuffer copy"];
        GBufferCopyUniforms[kGBCColourTexture] = glGetUniformLocation(gBufferCopyShaderProgram, "colourTexture");
        GBufferCopyUniforms[kGBCDepthTexture] = glGetUniformLocation(gBufferCopyShaderProgram, "depthTexture");
#pragma mark set state for g-buffer copy shader
    glUniform1i(GBufferCopyUniforms[kGBCColourTexture], 0);
    glUniform1i(GBufferCopyUniforms[kGBCDepthTexture], 1);

Finally, here’s the code for drawing (see elsewhere in this blog for discussion of setting up and buffering a two-triangle quad)

    int frameWidth = self.theGLView.frame.size.width/MAIN_RENDER_BUFFER_SCALE;
    int frameHeight = self.theGLView.frame.size.height/MAIN_RENDER_BUFFER_SCALE;
    glBindFramebuffer(GL_FRAMEBUFFER, preTransparencyDuplicateFBOId);
    glViewport(0.0, 0.0, frameWidth, frameHeight);
    glClearColor(1.0, 0.1, 0.1, 1.0);
    glBindTexture(GL_TEXTURE_2D, mainRenderTextureId);
    glBindTexture(GL_TEXTURE_2D, mainRenderDepthTextureId);
    glDrawElements(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, 0);

1 thought on “How I Learned to Stop Blitting and Love the Framebuffer

  1. You’re not blitting anything in any of your examples. You’re copying textures. Those are different operations. Blitting is done using glBlitFramebuffer and it is debated if it’s better to use that than to render a fullscreen triangle. The behavior is very driver dependent.

Leave a Reply

Your email address will not be published. Required fields are marked *