BGE Development: Rasterizer and Scenegraph optimization (proposals, discussion)

A big problem with BGE is the rasterizer and scenegraph performance, especially if thousands of objects are put in a scene. I have looked at the source code of the rasterizer and tried to figure out how the rendering speed could be improved.

There is actually a function to optimize the mesh structure, but it is commented out, because of some serious issues that comes with it (quadratic build time…). The idea is to join meshes in a certain radius to render them more effectively. I made some tests with a c++ kd-tree library to speed up optimization time.

In my tests it shows up that it’s possible with joined meshes to render a lot more objects with much better speed. The video shows 20k, individual lowpoly objects that are batched together in about 50 render batches.

To compare the speed up:
Blender 2.58.1: 10 fps with about 80ms rasterizer time consumption

With optimized Mesh structures: 8 ms rasterizer time consumption and 40 fps with all 20K objects being displayed at the same time!

You can notice a drastic performance drop in scenegraph, if all objects show up in the camera frustum. This also should be a target for improvement.

Another thing is the request for level of detail. All solutions with python or logic bricks are wasted time. The decision for changing lod levels have to be done as efficient as possible and should be implemented right in the bge core.

There comes a lot of questions to mind, when thinking of improvements:

  • how to optimize scenegraph? (currently dbvt from bullet is uesed)
  • how to integrate a efficient lod right into object culling mechanism
  • how to optimize rendering speed correctly (background optimization with threads?)

UPDATE:
=============================================

Here you can find a windows 32 bit build that contains:

  • blender 2.59.1 with pepper branch merged
  • level of detail (basically working, but need a lot of improvements and bugfixes still…)
  • batch counter
  • poly counter
  • group references and improved KX_GameObject python API
  • some UI tweaks (enable/disable counter, optimization)
  • blenderplayer
  • extended python API for KX_Scene to change global lod factor

Download:

Windows 32 bit build:
http://dl.dropbox.com/u/2779060/Lod-Project/win32-Level-of-Detail-vc.zip

Example file:
http://dl.dropbox.com/u/2779060/Lod-Project/Lodbench.zip

=============================================

Get the actual source code for my development branches on gitorious:


git clone git://gitorious.org/~moerdn/blenderprojects/moerdn-bge-sandbox.git bge-sandbox
cd bge-sandbox
git checkout -b my-local-static origin/ras-static

Any information or idea on this topic would be much appreciated.
greetings,
moerdn

I’d be willing to chat about these topics on #blendercoders or #bgecoders on freenode. If you don’t have an IRC client, you could always use the IRC layout on PasteAll.

About lod:
as far as i know, is not industry standard to have automatic lod. It is cpu expensive! I think instead that could be more useful something like

  • a specific ui where you can set the levels of lod and for each level how much vertices the mesh should have per object(so only the needed objects would have lod)
    -at convert time, all the different automatically(with a proper algorithm) simplified meshes would be stored in KX_GameObject.m_meshes (that is a vector)
    -at rendering time, the right meshes is chose according to the distance from the camera. For the physics shouldn’t be problems since is expected that a simpler mesh is already used

This will have all the advantages of lod without the main drawbacks: heavy computation for simplifying the mesh at runtime.
But this will:

  1. increase the ram usage(it depends by the number of lod and the amount of vertex, but it can easily duplicate) Maybe this can be avoided by caching the data on the disk.
    2)longer conversion time(but this can be avoided saving the already converted scene in a file and load them)

I haven’t looked at the rasterizer code, and i’m not too into that stuff, so i can’t propose anything else…

One thing that can be helpful/inspiring, is to see how others handle LoD. Here’s an example of LoD in UDK: http://www.youtube.com/watch?v=T1WlR-ghlSc

uhm, this is pretty easy to do on UDK, very intresting!
Moguri I have a simple question, On blender can I make level of details like those without spend too mutch logick processment on it, How UDK calculate the distance? Because looks like you can do it easly by moving the LODs to other layers, well I dont know, the coding for this s that hard?

Adding static LoD like what you see in the UDK video should be pretty easy. A few people have expressed interest in trying to tackle LoD, so I’m not going to touch it for now.

uhmm, cool, thanks moguri, It ll be better to have it integrated on blender as we see on UDK, but aniway, there are several other things to implement other than those!
thanks!

I think, I have the most things working to integrate a simple discrete LOD system. The most important question is how to reduce the amount of objects to check. KD-Tree can handle this quite well but perhaps there are better ways? How much distance areas should be used?

KX_GameObject.meshes sounds got for a Python API. The user could easily select meshes in die UI.

Joining meshes (or even scene nodes) on far distance is very important to reduce the iteration amount. Could it be more effective to use a octree for scenegraph operations?

These optimizations could be done as a background job, also LOD don’t have to be calculated every single frame. (only depending on movement speed)

Besides static geometry optimization, there come up some other things from ogre3D in mind:

  • instancing
  • paged geometry for huge outdoor scenes
  • impostors for far distance objects like trees

@moguri: will join irc channels these days. thanks!

greetings, moerdn

I would really love to see this in trunk ASAP!

First UI implementation for LOD is done (like UDK)

http://img20.imageshack.us/img20/3455/lodui.jpg

Specify what meshes should be used for detail levels. Next step is to convert this to gameengine data.

greetings, moerdn

Great work!

From experience, kd-trees / quadtrees etc. and mesh swapping work quite well implemented in python, but it would be cool to have in the core.

Also very curious to hear your thoughts on impostering/paged geometry

WOW, very cool mordin, this add-on or whatever you are using will be avaliable to download?

Oh man. you’re great.
BGE is back on track!

Please don’t expect to much from these little hacks. LOD as a standalone feature is (IMHO) useless. It have to be coupled with rasterizer and scenegraph improvements. LOD doesn’t mean you can render thousands of objects. Its only useful for better polygon distribution in your scene.

For quick tests I have to use the blender binary tree (bullet method is twice as fast…), that calculates the visible objects. Converting editor data to bge data is working now and switching distance also works. Next step is to replace mesh.

greetings, moerdn

And here we go:

greetings,
moerdn

cool,very coooll!!!I keep wondering if you in so short period of time made sutch a thing why the developers cant?

Very interesting. So moerdn, do you get the total distance from the camera to the furthest object to determine when to trigger the LOD?

@SolarLune: At the moment manhatten distance is compared between objects in the frustum and die active camera to avoid a lot of square roots. (Thanks to Moguri!)

The bottleneck right now is is the object count. You can test it with lets say 2000 cubes, all shown up in the viewport. On my geforce 8600M GT, the rasterizer calculations are about 10ms. And it doesn’t matter what lod level is shown. This problem have to be fixed get a use of the LOD system.

It also seems that replaceMesh is an expensive function, if a lot of objects are switch at same time. (scene graph is slows down)

@SolarLune: You asked on youtube for UI improvements and your right. This was just a quick test to show the functionality. If anyone is interested in making a better panel, feel free to do this :slight_smile:


    def draw(self, context):
        layout = self.layout

        obj = context.object
        
        if obj.type == "MESH":
            row = layout.row()
            row.label(text="Object: " + obj.name)
            
            row = layout.row()
            row.prop(obj, "use_lod")
            
            if obj.use_lod:
                
                row = layout.row()
                row.prop(obj, "lod1")
        
                row = layout.row()
                row = layout.row()
                row.prop(obj, "lod2")
        
                row = layout.row()
                row = layout.row()
                row.prop(obj, "lod3")

greetings,
moerdn

maybe like this can be more easily. :confused:

http://i1108.photobucket.com/albums/h404/elfazblend/ui_1.png

@Moerdn - You know, I implemented an LOD function myself, and I found that there were slowdowns on two fronts.

  1. If you replace mesh more than necessary, it will slow down the rasterizer. In Python, you can check the object’s current mesh to find if it’s already been replaced, and if not, then replace the mesh.

  2. If a mesh is high poly but isn’t in the layer, it will need to be loaded up. The way to fix this would be to place the mesh somewhere in the game scene to load it up once, rather than loading it each time you call for replacing the mesh.