Getting world-space coordinates of screen fragments in glsl

So, you’re probably asking yourself why on earth you would even want to do that? Well, it’s useful information if you’ve got a camera which can intersect with things in your scene. The most obvious example here is water – say, for example, you wanted to be able to have a distort effect and reduced fogging distance to make the underwater part of your environment visually distinct from the part above water:

Camera Submerging

You may have noticed that in e.g. elder scrolls games you can trick the camera into behaving as if it’s not underwater when you’re near to the air-water interface. This is presumably because they’ve just set a camera height which denotes “underwater”, but what if the player has positioned the camera so that half of the screen is underwater and half is above? Hence you need an approach which works per-fragment.

How do you work out which part of your screen is actually underwater? You generally can’t do it when you’re rendering your water’s surface, because you’re not going to be shading any pixels which aren’t directly at the air-water interface. If you’re under the surface of the water and looking down, this approach will immediately fail. What you are going to want to do is find out a way of masking off bits of your screen that are underwater and using them to do your “underwater” effects.

Finding out the world-space position of your screen is surprisingly easy to do if you’ve got any experience of working out position from depth. The first thing you need to do is get the location of your fragment. This requires you to have the dimensions of your viewport, which can easily be passed to a fragment shader via a uniform (note: a really cheap way to get the location of your fragment if you’re rendering a full screen quad is to give your quad texture coordinates equivalent to gl_Fragcoord.xy/screen_dimensions.xy i.e. (0, 0) in the bottom left hand corner and (1, 1) in the top right hand corner). You then need the inverse of whatever matrix you are using to get from world space to window space (“normalised device coordinates”), which in most cases is going to be your camera matrix (camera world space position and orientation) multiplied by your projection matrix.

I think that I should reiterate, and be specific about, exactly which spaces I’m using here as I think the major reason why openGL/GLSL tutorials on the net are confusing is that nobody can decide on what their projection matrix represents. I’m going with the definition that it is the perspective matrix which transforms from camera space to clip space. As mentioned above, the camera matrix represents the camera’s position and orientation and transforms from world space into camera space. The model-to-world matrix is what transforms from model coordinates to world coordinates, and the model matrix refers to a model’s rotation and scale. This technique requires drawing a single untransformed quad with texture coordinates from 0-1 with (0, 0) at the bottom left hand corner, and the same position coordinates, which is a straightforward way of doing anything in screen space. You don’t need a linearised depth buffer for this. Oh, and just to confuse things, in my preferred world space positive x is east, positive y is north and negative z is up. Due to sheer laziness I don’t bother constructing any 3×3 matrices or lesser outside of the shaders as it’s one less thing for me to screw up while setting uniforms, so there are a few vec4 multiplications in my code which weren’t strictly necessary but avoided some extra complexity elsewhere.

Here’s the original implementation of gluInvertMatrix which will get you your clip-space-to-world-space transformation; this is a direct copy and all credit goes to the original authors, but it can be a little tricky to find this code:

bool theOriginalGluInvertMatrix(const double m[16], double invOut[16])
{
    double inv[16], det;
    int i;
 
    inv[0] = m[5]  * m[10] * m[15] -
    m[5]  * m[11] * m[14] -
    m[9]  * m[6]  * m[15] +
    m[9]  * m[7]  * m[14] +
    m[13] * m[6]  * m[11] -
    m[13] * m[7]  * m[10];
 
    inv[4] = -m[4]  * m[10] * m[15] +
    m[4]  * m[11] * m[14] +
    m[8]  * m[6]  * m[15] -
    m[8]  * m[7]  * m[14] -
    m[12] * m[6]  * m[11] +
    m[12] * m[7]  * m[10];
 
    inv[8] = m[4]  * m[9] * m[15] -
    m[4]  * m[11] * m[13] -
    m[8]  * m[5] * m[15] +
    m[8]  * m[7] * m[13] +
    m[12] * m[5] * m[11] -
    m[12] * m[7] * m[9];
 
    inv[12] = -m[4]  * m[9] * m[14] +
    m[4]  * m[10] * m[13] +
    m[8]  * m[5] * m[14] -
    m[8]  * m[6] * m[13] -
    m[12] * m[5] * m[10] +
    m[12] * m[6] * m[9];
 
    inv[1] = -m[1]  * m[10] * m[15] +
    m[1]  * m[11] * m[14] +
    m[9]  * m[2] * m[15] -
    m[9]  * m[3] * m[14] -
    m[13] * m[2] * m[11] +
    m[13] * m[3] * m[10];
 
    inv[5] = m[0]  * m[10] * m[15] -
    m[0]  * m[11] * m[14] -
    m[8]  * m[2] * m[15] +
    m[8]  * m[3] * m[14] +
    m[12] * m[2] * m[11] -
    m[12] * m[3] * m[10];
 
    inv[9] = -m[0]  * m[9] * m[15] +
    m[0]  * m[11] * m[13] +
    m[8]  * m[1] * m[15] -
    m[8]  * m[3] * m[13] -
    m[12] * m[1] * m[11] +
    m[12] * m[3] * m[9];
 
    inv[13] = m[0]  * m[9] * m[14] -
    m[0]  * m[10] * m[13] -
    m[8]  * m[1] * m[14] +
    m[8]  * m[2] * m[13] +
    m[12] * m[1] * m[10] -
    m[12] * m[2] * m[9];
 
    inv[2] = m[1]  * m[6] * m[15] -
    m[1]  * m[7] * m[14] -
    m[5]  * m[2] * m[15] +
    m[5]  * m[3] * m[14] +
    m[13] * m[2] * m[7] -
    m[13] * m[3] * m[6];
 
    inv[6] = -m[0]  * m[6] * m[15] +
    m[0]  * m[7] * m[14] +
    m[4]  * m[2] * m[15] -
    m[4]  * m[3] * m[14] -
    m[12] * m[2] * m[7] +
    m[12] * m[3] * m[6];
 
    inv[10] = m[0]  * m[5] * m[15] -
    m[0]  * m[7] * m[13] -
    m[4]  * m[1] * m[15] +
    m[4]  * m[3] * m[13] +
    m[12] * m[1] * m[7] -
    m[12] * m[3] * m[5];
 
    inv[14] = -m[0]  * m[5] * m[14] +
    m[0]  * m[6] * m[13] +
    m[4]  * m[1] * m[14] -
    m[4]  * m[2] * m[13] -
    m[12] * m[1] * m[6] +
    m[12] * m[2] * m[5];
 
    inv[3] = -m[1] * m[6] * m[11] +
    m[1] * m[7] * m[10] +
    m[5] * m[2] * m[11] -
    m[5] * m[3] * m[10] -
    m[9] * m[2] * m[7] +
    m[9] * m[3] * m[6];
 
    inv[7] = m[0] * m[6] * m[11] -
    m[0] * m[7] * m[10] -
    m[4] * m[2] * m[11] +
    m[4] * m[3] * m[10] +
    m[8] * m[2] * m[7] -
    m[8] * m[3] * m[6];
 
    inv[11] = -m[0] * m[5] * m[11] +
    m[0] * m[7] * m[9] +
    m[4] * m[1] * m[11] -
    m[4] * m[3] * m[9] -
    m[8] * m[1] * m[7] +
    m[8] * m[3] * m[5];
 
    inv[15] = m[0] * m[5] * m[10] -
    m[0] * m[6] * m[9] -
    m[4] * m[1] * m[10] +
    m[4] * m[2] * m[9] +
    m[8] * m[1] * m[6] -
    m[8] * m[2] * m[5];
 
    det = m[0] * inv[0] + m[1] * inv[4] + m[2] * inv[8] + m[3] * inv[12];
 
    if (det == 0)
        return false;
 
    det = 1.0 / det;
 
    for (i = 0; i < 16; i++)
        invOut[i] = inv[i] * det;
 
    return true;
}

Again, you pass this matrix to your fragment shader as a uniform.

You can work out your clip space coordinates from window space simply by multiplying windowSpace.xy by 2.0 and subtracting 1.0 to get the xy component; and, neatly, since you want the position actually “touching” your screen, the z component is 0.0 (I generally set w to 1.0 although it’s not strictly necessary). You then multiply vec4(windowSpacePos.xy * 2.0 – 1.0, 0.0, 1.0) by your inverse projection matrix.

The last thing you’re going to need is a way of figuring out the position of the volume you’re going to stick your head inside mathematically; for water in a scene, many people just use a single flat quad and so you just need to know the position of that quad in the up-down axis. I do waves by transforming a mesh using a texture in a vertex shader, but it’s all determined mathematically so you can just copy and paste the same code here.

Here’s an example assuming that your water level is flat and at z=0, with negative values of z denoting worldSpace “up”. All I’m doing is outputting 1.0 to a texture (which is a simple 4 channel texture the same size and shape as my main frame buffer) if the fragment is above water level and 0.0 if the fragment is below:

//fragment shader for figuring out what’s above water plane at z=0 with -z = “up”
 
#version 330
 
uniform mat4 inverseProjectionMatrix;
uniform float waterLevel;
 
in vec2 texCoord;
 
out vec4 underwaterStencil;
 
//figure out water heights at each screen point
 
void main() {
    vec4 worldSpacePositionOfScreenFragment = inverseProjectionMatrix * vec4(texCoord.xy * 2.0 - 1.0, 0.0, 1.0);
    vec3 processingPosition = vec3(worldSpacePositionOfScreenFragment.xyz/worldSpacePositionOfScreenFragment.w);
 
    //if fragment is underwater then fragment world position will be more positive than water level
    float isUnderwater = sign(processingPosition.z - waterLevel);
 
    //to get a stencil we want values such that underwater is 1.0 and not-underwater is 0.0
    //if the fragment is above the surface, the result of “sign” above will be -1.0
 
    isUnderwater = clamp(isUnderwater, 0.0, 1.0);
 
    underwaterStencil = vec4(isUnderwater, isUnderwater, isUnderwater, 1.0);
}

You should now have a texture where every fragment below the surface of your water is white and every fragment above is black, which you can then feed into subsequent shaders which will fog and distort your image; as another neat trick, if you’re reconstructing depth in the same shader as above you can work out the depth of water you’re looking through at the same time and, for example, use that to apply thicker fog or more distortion; similarly, you can just multiply those calculations by the result of “isUnderwater” to have the effect turned on or off per pixel without branching your shader code.

The final result:

Per-fragment water distortion and fogging effect
Per-fragment water distortion and fogging effect

2 thoughts on “Getting world-space coordinates of screen fragments in glsl

Leave a Reply

Your email address will not be published. Required fields are marked *