Debugging a Blender Lock Up

We’re building a very complex Blender addon and we’re seeing an unusual “lock up” under certain conditions. The symptoms are:

1 - Screen does not refresh for several minutes
2 - The Blender process uses 100% of the CPU while it’s locked up
3 - It eventually recovers - it doesn’t crash
4 - It only happens when our addon panel is drawn - if we hide it, there’s no lock up
5 - Sometimes it doesn’t happen at all - depending on the data loaded into our addon
6 - Print statements at the top and bottom of our panel’s “draw” function are both printed quickly, but it still locks up

The code is far too complex to boil down to an easy example to post, and I know that it’s impossible for anyone to give specific advice based on these symptoms alone (unless this is a known manifestation of a known problem). So I’m mostly looking for advice on how to figure out what Blender is doing when it’s locked up like that.

Does anyone have any suggestions on how to debug a problem like this?

at the start of every function, print the functions name & current time to the console, leave the console open (ideally so you can see both blender & the console), then try to make it lock up… see which function/s is causing it to lock up then figure out the problem?

Some ideas:

  • Do you accidentally use threads?
  • Are you maybe misusing the bgl calls? Maybe you are not opening/closing contexts correctly, or caling some operators etc
  • Have you tried figuring out which component of your panel is essential for your addon to block blender?
  • Operators can be invoked without you noticing it. Maybe something in an operator’s handle is actually blocking blender, not the draw method per se

Also maybe you should look into my work on asyncio in Blender. It’s still early days, but beyond sane network programming without Python threads, asyncio also offers a way to build more understandable and predictable concurrency in Python, especially where “context switching” between different tasks of the complex software needs to be timed as tightly as in Blender.

Sorry I’ve been away for a week.

Thanks to both of you for your suggestions. I’ll explore them and see what I find.

Another hint is that it’s also related to using different versions of our addon which defines slightly different properties. To handle upgrading, we store a pickled python dictionary containing all of our application data in an ID string property inside the .blend file. When our addon opens a .blend file that doesn’t match the current version of the addon, it rebuilds our entire property tree from the data found in the pickled dictionary (after applying a series of “patches” to the dictionary to upgrade it to the current version).

The problem arises after that upgrade process. Someone suggested that there might be a missing callback problem if the old addon defined a callback on a property and that callback is no available in the current version of the addon. Maybe that’s why it only shows up when the panel is open. I would expect more of an error in that case rather than a slow down, but I really don’t know. I also would expect the callback to be defined for the class and not be stored in the .blend file at all, but I don’t know enough about the details to be sure.

Thanks again for the help.

Have you thought about storing your data as a JSON text file object? The JSON format tends to be a bit more predictable. It’s also human-readable, within reason.

I also think that callbacks on properties may be a bad idea if they are stored inside a blend file. Certainly managing/pruning/updating these callbacks upon initialization may be a good idea!

We could use JSON if we wanted to export it, and we may do that. But that’s not the problem. The pickling and unpickling of our data appears to be fine. I think the problem has to do with Blender properties and possibly callbacks on properties.

Here’s a simple example of what we’re trying to do:

Let’s say that our addon needs the area of something, and so we create a float property named “area”. Later we decide that we want to represent that area as a length and width, so we eliminate the “area” property and add a “length” and “width” property. But what happens when someone opens an older .blend file with our newer addon? The length and width will end up getting their default values and will ignore the information in the older “area” property. To fix this, we store a unique version number inside the .blend file and we store a pickled dictionary representation of our core data inside a string property. When we recognize that the version doesn’t match, we open that pickled dictionary and apply a series of “updates” to the data to make it match the current addon. In this simple example, we might set the new length and new width both equal to the square root of the old area. Then we put those values into the Blender properties as they’re defined in the current version of the addon.

Regarding the callback issue, I’m not fully sure I understand how they work. For clarity, let’s say we have a property named “area” and it was defined as having an update callback named “area_changed”. The “area_changed” function was properly defined in the original addon. The new version of the addon now has two properties (“length” and “width”), and they each have their own callbacks (“length_changed” and “width_changed”). If I open an old .blend file with the new addon enabled, will the “area” property be looking for an “area changed” update function? I wouldn’t think so, because I would expect that the pairing of ID properties and functions is done by the currently loaded addon and not by any “addon residue” in the .blend file. But the symptoms are so mysterious that I’m wondering if it’s possible that there might be some confusion based on the mismatch between the addon’s defined properties and the properties found in the .blend file.

I doubt that these callbacks are your actual problem. If blender blocks, it will most likely be because your program is running. Possibly in an infinite loop, or at least doing something that takes a long time.

Try to comment some or most of your panel to see what may be causing the trouble. Or try to lavishly spread print statements around your code to see what’s happening. Or use a debugger. I haven’t used pdb in blender yet, though…

Commenting (or even closing) the panel eliminates the slow down. But if I put a print statement at the top and at the bottom of the panel drawing function, they both execute very quickly even when it’s in the “slow down” mode. So while problem is related to displaying the panel, it doesn’t appear to be in the panel drawing code itself. That’s what got me to thinking about callbacks.

That’s a big part of my question. I’ve used this “magic” code snippet to do some limited debugging:

# This drops into a python interpreter on the console ( don't start Blender with an & !! )
__import__('code').interact(local={k: v for ns in (globals(), locals()) for k, v in ns.items()})

but I think I need more horsepower on this problem. I don’t even know how to start pdb in Blender. Any suggestions there?

What I meant with commenting out some lines is to figure out which lines are causing the excess computation. This lock up may not be in your code, it could also happen in the C code for the Python bindings, or in User Interface code, or whatever …

I think there isn’t much that can be done to help you without seeing the code or what you want to do at all…

That’s what I’m expecting. My first hope in posting was that someone might recognize this kind of problem from similar experiences in their own addons.

I agree there as well. My second hope in posting was some suggestions on how to debug such a thing in Blender. Your suggestion for pdb is a good start … if I can figure out how to use it. Thanks.