[DEV] - Multithreading for Armature Animation Playback Added

@solarlune it still cant be disregarded that whats supposed to be a performance boost should not be causing any performance loss especially on my system. And a bone count of 70 is the same as many games that have came out and been out for years now on main characters, and a poly-count of 70k is about double the standard for many characters now also.

So now we can have more characters on screen. :slight_smile:

70k = 70 000 faces for a single char?

My game kept crashing and I couldn’t figure out why. I’ll remove camera animations and test again to see if this is the case.

No, I believe framerate loss can be disregarded depending on the circumstances and implementation. For example, as far as I recall, to a certain point, deferred rendering will run slower than forward rendering under certain circumstances (if you have less lights in the scene). However, deferred is pretty much all around a better implementation of rendering when considered next to forward for most real world applications (more than a few lights).

It kind of makes sense that multithreading wouldn’t help with a single (or even a few) armatures. 70 bones is a lot, no matter how you slice it. Perhaps multithreading should be selectable.

While 70 bones might be the same as games that have come out, you know that they were probably made on optimized custom engines or professional grade readily available ones like Unreal. Same thing with the poly-count. Multithreading should be a tool to help speed up larger numbers of armature animation, not speed up armature animation at the core.

EDIT: Also, could you provide a link that says that 70,000 polygons and 70 bones are standard? I’ve searched, but it’s hard to pull up any info on recommended / industry-standard graphics levels.

On Deferred rendering, do remember that the method Kupoman is after is Inferred rendering, I’ve actually been hearing that the Unreal 4 engine has very poor support for transparent materials right now because of the Deferred method.

Anyway, the test objects that I made have at least 100-150 faces (which averages to about 10 per-bone). On that, Solarlune is right in saying that some of the optimizations should be focusing on speeding up games with a large amount of that type of content as opposed to a single mesh, because many games are not just going to have one animated object.

I think that the only way to get it any faster, is to update on the gpu, not the cpu, so there is instant results (no shuffling around)

Though I am uncertain if this is 100% accurate,

Does anyone Know openCL, OpenGL, and or cuda?

1 big thread= good for 1 big object

4 small threads -> good for many objects

GPU ->16 or more threads?
Also aren’t GPU really good with vectors and transforms?

The modern GPU has hundreds of cores (or computing units), but the catch is that they’re a lot simpler than a CPU core (meaning they’re a lot more restrictive as to what type of instructions they can process).

This is because they were historically designed just for graphics, but Nvidia and ATI has since started to expand the architectures to account for an increasing number of general computing tasks like physics and rendering (though complex computations like animation may not be fully doable on the GPU yet).

Having the Bullet implementation include the upcoming OpenCL code will be a major boon to BGE users for example (but it’s a lot of work and it’s not ready yet last I heard).

Here’s the bench test I used to verify the speedup if anyone’s curious.

Using openCL is better for things like particles and some physics, not really things like armatures. It’s for many things that need little calculation. CPU is better for less things that need more calculation (like game logic). Also, modern GPUs have thousands of stream processors, which further have threads. The latest AMD dual-gpu processor (R9 295x) has 5632 stream processors.

Using an example to compare a completely different way of rendering to throwing more cpu cores at a certain task is something you cant really do in this case. Inferred rendering which is deferred rendering with an extension will always cost more than a forward renderer disregarding any lights being rendered at the time for either tech.

Its due to deferred rendering requiring an extra quite costly few g-buffers being rendered every-frame, when compared to a forward renderer not requiring these buffers to-do the same thing. But of course after this initial extra cost, every extra light you add wont cost your scene to get rendered an extra time when compared to forward rendering.

About the bone count, 70 bones isn’t hard to imagine for a character in games these days. Most of the bones end up in the characters face for facial animation. Now I was trying to stress the engine out yes, and in doing so showed something quite odd to me. Im not familiar with multi-threading and whats the expected behavior in this case and whats not, I just wanted to bring it to attention as it seemed very odd. Im not trying to bash the work being done here as the multi-threading is clearly working for multiple objects.

Now think about not just my case for example, I have one down from the best of intels ivybridge series i5-3570k, its what two or so years old now and its clocked at 3.8ghz. Now me getting a nearly 10 frame drop may not seem so bad, but how will this perform with people on slower older duel core or single core cpu’s how many frames will they be losing?

When i say 70k I mean triangles not quads for that and I did say 70k was double whats generally seen which is about 30k tri’s or so for characters now. There is no standard for bones or polys out there it would be impossible to give one every games different, all requiring different budgets for polys and bones I can only give you examples of what ive seen used in games from reading tech papers, speaking to character artists and seeing their art once its not under nda from studios anymore.

Sadly yes tech papers for games can be hard to come across ill see if can find the last one I read, it was for the last of us and quite a good read if I remember. They were showing one of the main characters with over 300 non real-time joints/bones to help the animators animate, and 85 of those bones were in real-time not baked or anything completely run-time driven on the ps3’s hardware. Now the ps3 does have some quirky architecture to handle this stuff it seems, but that hardware is 8 years old now.

Ive never heard of any new tech or advancement of doing skinning other than just slinging it to the hardware be it cpu/gpu and letting it handle it. It seems its almost the same as how polycounts have gone up as hardware has become more powerful. No new ways of dealing with more polys other than just faster hardware, same seems to be the way for pushing around polys with joints almost. I dont know about that though.

Do a lot of engines not use shape-keys then? The BGE supports shape key animation so you wouldn’t need dozens of bones for a single face.

If im correct in thinking that shape-keys in blender are the same as blend-shapes which are used in game-engines then yes. But I haven’t used shape-keys in blender so I cant compare, and there’s quite a few different naming conventions for blend-shapes so don’t quote me on that.

About blend-shapes, their generally used along side joints to get you that extra level that joints cant get you to, say an expression for instance.

The ps3 is not open source, so using the cell processor to run 8 threaded bone armatures etc, is probably going to be hard to get your hands on. From what I understand people have had a hard time cracking them.

@flame - You mention you getting a 10 frame drop doesn’t seem bad, and ask how people with slower computers will deal. Well, I’d guess they’d get less of a frame drop, because they’d be getting less of an overall frame rate. If you’re getting 100 FPS with a 10 FPS drop, and someone else is running at 25 FPS, then they’ll get a 2.5 frame drop. That’s linear math that assumes the other computer is just a slower version of your own (same architecture / CPU count / etc) but basically, the slower the computer is, the less FPS total they’re going to get, and so the less of a difference in overall frame count between the two. At least, that’s my theory, anyway; we don’t have a lot of test cases, real-world or benchmark, for this new multithreading.

I’m not sure of the nature of multithreading that Moguri worked on, so I can’t say for sure what the new threads actually do (i.e. split separate armatures between threads, split bones between threads, split vertices between threads, etc), and so what would happen to a computer that lacks some or all multithreading capabilities.

As for having a lot of bones in the character’s face, I’d probably go with either bones or shape keys to only deform the face when necessary, and not all of the time. I think you could also do with some LOD to use a more complex armature for cutscenes and close-ups than when the game’s in “action mode”. Even then, you don’t have to have a large number of bones in enemies and NPCs’ faces.

SolarLune; I would think the finer-grained the multi-threading, the harder the implementation (but the faster things potentially are), each core would still need instructions that allows everything to synch together and do the animation in parallel.

For an initial threading implementation then it’s not a bad thing to have separate threads handle entirely separate armatures, there’s probably a bit of potential to develop it further and we’ll also see if the BGE gets multithreading beyond animation.

Many of fallouts facial bones are used in custom character creation, and after the model is baked, are never used except to remake you player,

And articles I have been reading state that ps3 can only handle 65 bones in one armature when using hardware armature skinning (per model)

70 bones seems like ALOT of bones to be…I dont even know where you would use that many…i can see 30 total in both hands in the fingers if you make all your finger joints and and make them bend at each joint…but even then there are ways to use way less bone count in the fingers and get the same results…30k tris seems high to me too…I wouldnt understand the point of using 30k over 20k tris…could you really notice that much of a difference…plus you could probably get the same looking model that has 30k tris with 10k tris or less, and a normal map…I try to keep under 5k for chatacters…maybe im “behing the times”.

our protagonist rig has 2671 faces.

High detail model -> bake down -> low poly with high usability

also why would you use the head in game other then speech etc? CAN the animation with shape keys for all the vowels and letters, and a program can spit out a little movie of ANY speech + emotion.

Has anyone here used animated normal maps? can it be done in the game engine?

You can use animated normal maps, thats how they do some of the water ripple effects, but im not sure how you would use it for face movement. I think were sliding off topic anyways( my bad )

I just got a chance to test this out, and man, I never thought the BGE was this slow before, haha. I’m getting .6 FPS in a scene of 81 objects with 18 bones (which is a pretty low bone count) previous to the multithreading. With a build that has multithreading, I’m getting ~65 FPS, which is a HUUUUUUUUUUGE increase. This is just a pure armature-animation stress test, though - one single hemisphere lamp, and a plane that the characters are walking on.

EDIT: Boosted it to ~97 FPS by enabling Restrict Animation Updates, which is great, too. It looks like Moguri even implemented armatures not animating if they’re offscreen, which is really nice, too! Or maybe that feature was always there? I’m not sure, but it’s a nice feature nonetheless.

Even that framerate could be raised considerably by using less bones and/or by moving them less often. I’m not sure if anybody notices, but on games, armatures update less often the further away they get from the camera (for example, in the Wii game The Last Story). If you implement that, it should be much easier to have crowds.

Thanks an absolute ton to Moguri for working on this - it’s really useful.

I have a couple of armature related questions now that I thought I’d ask since people check this thread.

  1. Has anyone noticed any performance gains from using the BGE-based armature animation mode as opposed to the Blender-based armature animation mode? The animation mode can be set in the armature’s data panel (looks like a stick figure).

  2. Using the BGE-based armature animation mode sets the accompanying mesh to “Set Smooth”. Is this an unavoidable bug or symptom of the BGE-based animation mode?