2Dfilters optimization?

I was looking at this video - Blender Game Engine TECH DEMO:

And you can really see, how the filters alter the final image on the screen, making the BGE game an “AAA” quality looking game. ( the on/off filter displays it ) So have there been a solution of how to optimize the 2dfilters?

Not much has been done, as far as I am aware, to improve the rasterizer. Simply, PC hardware is getting better and cheaper. Any optimization would be to the glsl in the filter itself (such as using methods that require less processing to achieve a similar effect).

Also, some shaders have a variable named samples, decrease it to obtain an improvement in the performance, and other have somenthing like this: #kernel_size = 3, try reducing it.

As MrPutuLips say, to improve the performance of a shader, should be optimize itself, like a python script.

So the 2d filters can get some speedup. Thx for the info.
From the bge todo:

Maximum BGE screen size has one extra pixel (eg. 1025x769). That produces two problems:

  • 2Dfilters may work slower (the power of 2 buffer will be 2048x1024 in the above example

It will be interesting to see, what speed we will have after the problems have been fixed.
BGE bathroom v2.0:

You could try unrolling the for-loops if you have them in the GLSL code, but outside of doing it manually (i.e. manually sampling the points you want to sample), I don’t really know how to do it. You can probably look the technique up on Google.

Usually the for loops sample the entire screen with number of loop iterations as number of samples. Replacing them won’t do anything but make your code longer and harder to handle.
Also I don’t think that the 2d filters in the bge are slow, in most of the game engines I’ve used, 1 med-quality ssao shader could easily bring the framerate down to 30.

the filters are still slower then they should be, could this be because of the old techniques used in them?
are there even people working on them?

Do you have an example when an filter is slower when running with BGE then with anything else?

I think he is referring to the comparison between something like SSAO in Blender and SSAO in a triple-A game.

Quite simply, yes. Many new ways of calculating shaders have been made to work with newer graphics technology. They tend to utilize functions that the GPUs’ thousands upon thousands of cores excel in, or use specific parts of a graphics chip designed for calculating shaders.

Thats interesting to know. What about threading the raster, would that speed things too?

The Rasterizer is essentially “threaded” in OpenGL; It uses all (or most of) of the available GPU cores to process the graphics. Anything that would need to be threaded by the CPU is something that isn’t related to the shader (such as HDR Luminosity calculation using Python).

Oh, yeah. So for the shaders just need to use some of the new methods.

All this time and still no offscreen filters, ya know!

Not quite correct
http://blog.hvidtfeldts.net/index.php/2011/07/optimizing-glsl-code/

I know that I had one perfect performance DOF. I am going to look for it, because it looked awesome. The reason it worked so fine was that it had to be ran only once!

from that article:

Interestingly, the ‘iterations’ variable offers no speedup – even though the compiler must be able to unroll the principal DE loop, there is no measurable improvement by doing it.

If I understood this correctly, then the performance gain was from getting the constant variables out of the loop not the loop’s syntax itself. Same with a raytracer I did some time ago, jumped from 5fps to 30 when I excuted some operations like dot products and vector lengths once out of the loop

Edit: I’m really against representing performance gain by ratios. Saying that you’re getting double the performance might sound nice but in reality it can be 1fps increased to 2. fps specifically scales additively so it’s better to say the absolute fps gain.

I think I linked the wrong article, so I’ll summarise - essentially the compiler is not always able to figure out whether it can unroll a loop, so if in doubt, compare against an unrolled equivalent where possible.

Ratios are only useful in the context of the bigger picture as you say. I recently doubled the framerate of a high-task load project from 7 fps to 15, which is still rather poor.