Multiple Blender processes for multiple GPUs?

doublebishop · December 2, 2015, 5:10pm

One other thing, is that i am not sure if you have come across this issue yet… but GPUs crashing out. if you allocate out frames… it will put black placeholders for the allocated frames… however… if you render via python (bpy.ops.render.render(animation=True)… it will just close blender if the GPU crashes out.

you can do this by using the command

$ blender -b file.blend -s 1 -e 10 -P cuda1.py &

so setting the start and end frame before running the python file and having the python file execute the bpy.ops.render.render function i listed above

we run around 25 computers at peak times all rendering in parallel, using the no overwrite / placeholder option… it just saves having to allocate out frames to individual blender instances. especially for long animations.

mfiocca · December 2, 2015, 6:18pm

hey, thats awesome! its rare, but i have seen a gpu blackout from time to time, especially after a long run of frames being rendered. thanks for the tip!

mfiocca · December 2, 2015, 6:33pm

one thing that i am working on is moving our automation to rabbitMQ, where each frame in the animation becomes a queue item. rabbit does a great job of round robin assignment of tasks across multiple subscribers (nodes), and only when a frame completes successfully, does the job clear from the queue, but is put back into queue upon failure.

doublebishop · December 2, 2015, 7:06pm

The reason why we have started working on our own system, is because we heavily utilize the no overwrite / placeholder option… the benefit of this that when we fire off a render, our files are usually 100mb compressed… and about 300mb of texture data… so say 400mb on average per node… that is approximately 10gb of data we would have to transfer for each animation chunk (file is loaded once and textures are loaded once with the persistant textures option), say our animation is 9000 frames long (not unusual for us) that would probably be what like 450 chunks of 20 frames if we were to use a job system like rabbitMQ… this would mean approximately 180gb of network traffic compared to 10gb… less hammering on our main server, not to mention the read / load time of the blend file and texture data.

mfiocca · December 3, 2015, 5:33am

doublebishop:

The reason why we have started working on our own system, is because we heavily utilize the no overwrite / placeholder option… the benefit of this that when we fire off a render, our files are usually 100mb compressed… and about 300mb of texture data… so say 400mb on average per node… that is approximately 10gb of data we would have to transfer for each animation chunk (file is loaded once and textures are loaded once with the persistant textures option), say our animation is 9000 frames long (not unusual for us) that would probably be what like 450 chunks of 20 frames if we were to use a job system like rabbitMQ… this would mean approximately 180gb of network traffic compared to 10gb… less hammering on our main server, not to mention the read / load time of the blend file and texture data.

so if i’m understanding the no-overwrite method correctly, i’m curious how you handle situations where frames have been rendered already, but there are revisions in your blend file to be re-sent to the farm. Would the no overwrite method prevent your frames from be re-processed?

Also, using rabbitMQ doesn’t add much more net traffic than what you would have without it. Its just an AMQP system that allows you to build custom messages queues that could tell your nodes (via very small amounts of text data) what file to start rendering, and what frame, or range of frames, to start rendering. Rabbit relies on the node to be listening for the inbound message and how you handle that is up to you in your own software on the node. When a node receives the message, it carries on however you have it setup, whether it gets its assets from an nfs or the local system, that part is up to you. The assets themselves aren’t transported through rabbit.

doublebishop · December 3, 2015, 1:32pm

The blender file points to a network folder accessible by all computers… if you are doing 4 instances on one machine this doesnt matter

Overwrite turned off and placeholder is turned on

This is what happens on a two instance setup

instance one spins up, says ok lets render the animation, frame 1 is the start lets render that… is there a file already in the folder because overwrite is off… there isnt… ok lets create a placeholder… placeholder created… now lets render the frame…

instance two spins up… render animation… frame 1 ok is there a file there because overwrite is off… there is… so lets skip frame 1 and render frame two

instance one finishes frame 1, overwrites the placeholder with the actual file and moves on to frame 3, because frame 2 is being handled by instance two.

etc.etc.etc.

Also, using rabbitMQ doesn’t add much more net traffic than what you would have without it. Its just an AMQP system that allows you to build custom messages queues that could tell your nodes (via very small amounts of text data) what file to start rendering, and what frame, or range of frames, to start rendering. Rabbit relies on the node to be listening for the inbound message and how you handle that is up to you in your own software on the node. When a node receives the message, it carries on however you have it setup, whether it gets its assets from an nfs or the local system, that part is up to you. The assets themselves aren’t transported through rabbit.

Hmmm interesting, may have to look into it!

mfiocca · December 3, 2015, 3:09pm

doublebishop:

The blender file points to a network folder accessible by all computers… if you are doing 4 instances on one machine this doesnt matter

Overwrite turned off and placeholder is turned on

This is what happens on a two instance setup

instance one spins up, says ok lets render the animation, frame 1 is the start lets render that… is there a file already in the folder because overwrite is off… there isnt… ok lets create a placeholder… placeholder created… now lets render the frame…

instance two spins up… render animation… frame 1 ok is there a file there because overwrite is off… there is… so lets skip frame 1 and render frame two

instance one finishes frame 1, overwrites the placeholder with the actual file and moves on to frame 3, because frame 2 is being handled by instance two.

etc.etc.etc.

Oh, i get it now. Thats really cool and i didn’t even know blender did that no-overwrite / placeholder setup. To confirm though, if you wanted to render a revision, say there were just 5 frames of a longer sequence that needed rerendering, because of an animation or texture change, etc. those frames will have to be deleted first before restarting, because no-overwrite will skip them if they already exist?

doublebishop · December 3, 2015, 7:38pm

Correct… The frames you want to replace will need to be deleted first… we usually render into a separate folder instead and combine using the video sequence editor.

openprivacy · December 9, 2015, 5:32pm

How can I install the nvidia drivers without pulling in the whole world? sudo apt-get -s install cuda reports that it will install 586 new packages, including xserver-xorg and unity-control-center, neither of which I want or need on a headless machine. (Well, the X libraries may be needed, but there’s absolutely no need for unity…) Is there a leaner way to go about this?

TIA

openprivacy · December 10, 2015, 8:56am

[SOLVED] (my question hasn’t passed moderation yet, but…) Following instructions at http://docs.nvidia.com/cuda/cuda-get…nux/index.html I downloaded the CUDA toolkit from https://developer.nvidia.com/cuda-downloads and used the runfile installer. PASSed deviceQuery and bandwidthTest operations with flying colors!

openprivacy · December 11, 2015, 5:12am

@mfiocca: you wrote:

40,500 minutes to render on one ~$350 CPU (quad i7)
vs
5,400 minutes to render on one ~$350 GTX 970

I’m seeing approximately the same speed for render on the i7-4790 CPU vs the GTX 970 STRIX. Did you mean a single CPU core? As when we render on the CPU it maxes out all 8 CPU cores. If you were also using all 8 cores in that first measurement, then maybe there’s some tweaking of the 970 I need to learn. Currently disappointed that the 970 isn’t appreciably faster than the CPU. Any suggestions/pointers?

cisto · April 9, 2016, 8:23am

So if i understand all this correctly:
In case i would have two PCs with each 4 GPUs - the most efficiant way to render long animations would be to use a script like mfiocca does and the placeholder function?
Would it be wiser to build - instead of two PCs - maybe 4 and each with 2 GPUs to circumvent the issue with the individual render instances (Py-script to run each GPU as single node)?

doublebishop · April 10, 2016, 3:24pm

Yes, rendering via commandline is way more effective then rendering via gui. we have had render increases of around 30% faster.

Would it be wiser to build - instead of two PCs - maybe 4 and each with 2 GPUs to circumvent the issue with the individual render instances (Py-script to run each GPU as single node)?

so with each render we have multiple steps.

file loading (CPU / HDD)
BVH Creation, material compilation and image texture loading (CPU Only)
3D rendering (Can be GPU accelerated)
Compositing (CPU)
image saving (CPU / HDD)

So if you put 4 cards in one machine, you effectively are just accelerating the 3rd part of the rendering process. if you have 4 machines each with one gpu, you are accelerating all parts of the rendering process. It is one of the reasons why we favour one GPU per machine rather then multiple gpus per machine.

openprivacy · February 20, 2017, 8:59am

@mfiocca - thanks again for your thread and helpful comments. We’ve been running on our headless dual GPU renderer for over a year with no problems. Until now. We recently upgraded the Ubuntu 16.04 server which installed blender 2.78b (I believe we were running 2.78 or 2.78a previously) and now we’re getting errors like AttributeError: 'UserPreferencesSystem' object has no attribute 'compute_device_type' - we’ve posted more complete details at http://blender.stackexchange.com/questions/74075/headless-render-cant-find-compute-device-in-2-78b - but since this all started here, I thought I’d ask here, too.

Again, thanks for getting us started!

Ezequiel_Arebal · September 3, 2018, 1:14pm

Hi can anyone help me

mfiocca · September 4, 2018, 12:34pm

Its cool to see this thread still getting activity. Looking at your error stack trace, it appears that something in your CUDA drivers isn’t compatible with your linux kernel version. I haven’t work with GTX or Tesla cards before, so I am not sure I can be of much help with proper drivers etc.

But, I would look at finding and installing the CUDA and NVIDIA drivers that support both of your cards and your OS.

I hope that help, and let me know if you get it figured out!