Blender Speed Up in Internal and Cycles with tile reordering

To start of, i am not sure if this thread is belonging here , so reply on that if i’m in the wrong category

This a suggestion for an improvement to shorten the rendertimes
When rendering at a big tile size like 128128 or 256512 etc. Towards the end of the rendering process only 1 or 2 tiles are rended at the same time, while often the GPU or CPU can do 4 or 8 at the same time. So its unused power which slows the render down.
My suggestion is that when (assuming threre are 4 threads) only 2 are still working that blender splits the tiles up so every Thread is working until the finish.

I know it would not drastically change the render times but at least saving maybe 10 sec per frame. So when redering animations it could come into effect, especially for older 4 or more core machines. (Eg. AMD Phenom 1 X4 )

Not easy to fix, as Cycles render only one “frame” (sample) at a time, and tile manager that spread tasks between different GPU/CPU need to be rewritten. I have similar problem trying experimental trick that try burn less APU on dark parts (ERPT- stype render), and it have even worse problem, as actual tile size depend on pixel color and vary from 1 to max_tile, as result soome scenes actually and as single thread. Only proper solution is to have more samples in process same time, to keep all cores busy. Partial solution maybe just run many copies of blender, but it not work on GPU.

Hi,
Maybe an improvement of the auto tile size add-on will do it ? If you consider adapting the total number of tiles as close as possible to a multiple of your rendering threads.
BTW rendering large tiles with CPU isn’t the best, and a single GPU renders only 1 tile at the time.

Tom me with CPU 8x8 is the fastest; and then you dont have long waits to as with large blocks.

BTW Adaptive Samplings will be part of Blender, code is finished (i think), so prepare for a (70% to 30%) time reduction :))

Problem is that with intel HD graphics and almost every all intel CPU is that it works fast on 1 tile which is small by 1 thread but faster on multiple tiles when they are bigger

And many GPU with CUDA are made multi tiles rendering (one of the points of CUDA) GPU’s can render faster when doing 1 tile at a time but as they develop muliti tile is catching up and already meeting with new ATI’s and Nvidia (but blender does not support ATI)

Adaptive sampling is very difficult to write so it will take some time before blender supports that. But would boost render times a lot.
IDK your CPU but many render fast at low and very fast at high tile size, Compare the rendertimes of very complex scenes with 8x8 16x16 128x128 128x256. Often people dont dare big tile size’s but loose a lot of power. Eg I only have an intel i5 2x 2.5ghz and MAC OSX but it boost it self so much

he did say adaptive sampling is mostly done… you can even use it now if you download a build or compile one your self :stuck_out_tongue:

also can you explain what you are talking about here “And many GPU with CUDA are made multi tiles rendering (one of the points of CUDA) GPU’s can render faster when doing 1 tile at a time but as they develop muliti tile is catching up and already meeting with new ATI’s and Nvidia (but blender does not support ATI)”

who has this tech and why dont you share with us xD