Will Cycles ever be dedicated CPU and GPU?

Razorblade · February 17, 2015, 1:45pm

I think loading Blender 2 times, one to render over CPU and on to render over GPU, is not that ideal.
A computer has X cores with a fixed amount of threads. You can have more threads but those would simply just queue up. So for speed reasons, one need to minimize memory load, and use the maximum amount of render threads. Each or one of the threads, should monitor the feed and pull of data to the GPU(s), store it to disk as a rendered frame. So actually if you do GPU rendering today, you do require that CPU interaction too, but then the CPU is doing nothing else besides that.
It would require updates in CPU tile rendering or the tile distributor part to let the CPU doing some render work while also dong that other job.

The approach of having seperate GPU / CPU rendered frames would be a smaller change in code.
As compared to having a method to render CPU and GPU both on the same frame.

Also since CPU and GPU ahve different optimal tile sizes it wouldnt be handy to let them work on the same frame, due to the fact that a GPU has much more cores to these calculations on then a CPU has. …

However still CPU + GPU doing each their frames would be very nice for animation work

Maybe Lucas could code it ?, he’s the one who understand this in coding terms.
And he seams the coder who has an interest in speeding up blender.
(light portals, Adaptive Sampler, Metropolis… all projects from him).

SterlingRoth · February 17, 2015, 1:51pm

Once again, in order for cpu and gpu to cooperate on the same image at the same time, you have to deal with crippling latency going between video ram and system ram. It would almost certainly be slower to use both gpu + cpu until there is a faster bus to transfer data back and forth.

LordOdin · February 17, 2015, 2:08pm

That is a horribly inefficient way to render a clean image xD… and here we go with the multiple instances of blender again

zeealpal · February 17, 2015, 4:24pm

Not at all, does Cycles rendering on the GPU get crippled by RAM <–> vRAM latency? No, does Cycles rendering on CPU get crippled by that as well? No. That’s because neither render mode needs to talk to the other device in any real amount while rendering.

When using multi gpu, each separate GPU requires its own set of memory, a copy of the scene and the cycles code, and simply renders the tiles it is told to. The CPU would simply require its own set of RAM for a copy of the scene, the cycles code and to be told to render some tiles.
That is essentially running multiple instances of Cycles, which is basically what multi gpu rendering does anyhow.

One thing would be to run the CPU version with total threads - 1 per GPU, as it appears that the CUDA drivers create some CPU load per GPU, and for the GPU to run at full speed it may need a thread free.
E.g:
BMW 1 GPU: 57 seconds, Blender uses 13% CPU (1 thread)
BMW 1 GPU: 70 seconds, with BMW CPU running at same time on 8 threads
BMW 1 GPU: 58 seconds, with BMW CPU running at same time on 7 threads

SterlingRoth · February 17, 2015, 5:00pm

This is true, and as I said earlier, I have taken advantage of that while running multiple instances. To get it to run in the same instance, however, would take a lot of careful redesign of the tiles system. It could work, but I doubt it would result in much meaningful speedup. Would a diesel truck have more horsepower if it had a gas engine too? Yes, but is it worth the added complexity and overhead?

If someone wanted to craft and addon that could streamline the process of running multiple instances and merging their results, more power to them! But I don’t think it’s worth developer time to chase a convoluted optimization like that.

YAFU · July 19, 2015, 6:45pm

That is very interesting …
It would be great to see either of these two options implemented in Blender!

It would be useful to my what was proposed by LordOdin (my CPU is faster than the GPU with SSS, Volumetrics and Hair)

tomtuko · July 19, 2015, 7:33pm

You could also specify certain layers to use cpu and others to use gpu. That way the sss, volumemetrics and hair could run on cpu and the rest on gpu. with proper and efficent masking they could easily be put back together in the compositor.

YAFU · July 19, 2015, 8:26pm

Yes, that would be great and I think that is what is proposed by LordOdin. And as he has said, it is important be able to set the tile size for each device.