Up to 4x speed for AMD rendering

Hi there,

AMD did a great job on the split kernel patch. I had a go with their code to try to port the missing parts to 2.75a. It comes viewport rendering needs 25% of the time official builds need (so it’s 4x faster) and you can edit your materials in real time. I tested changing node trees by adding new material types, changing parameters, etc… it render really fast without annoying rebuilds :slight_smile:

Details about the build:
The missing parts I ported where made without hair and motion blur, so for the time being, the code is their but not activated. It’s not activated either when you benchmark BMW27, sponza or Pavillon Barcelona as Sergey added hair and motion blur to the selective node code in 2.75a, which means the kernel used for rendering mentioned benchmark scenes also don’t have hairs or motion blur in 2.75a official. So times can be compared between my builds and the 2.75a official.
Only the rendering part was modified. A memory optimization was also made: I removed the BVH memory optimization patch from Sergey: https://github.com/dfelinto/blender-git/commit/68478aea016e87e071d550797b36acd32d33bd12 as it surprisingly increase memory usage on GPU in my tests. This doesn’t change capabilities nor does it change stability. In fact this build could render thing that won’t render in official one.

Performance gains in F12 renders:
F12 rendering needs about 75% of the time the official builds need.
All test were made with 15.7 drivers on win7 with an HD7770 1GB.
To give an idea, the sponza scene took 19min on 2.75a with 15.6 drivers, it takes 8min42 on my build with 15.7. For BMW27, it went from 4:23 to 2:36. For the barcelona scene from 2:41 to 1:34.

Possible further improvements:
At the moment, the automatic tile size (which allows to always render only one tile per GPU, letting Blender internally decide the best tile size to use) uses lot of memory. Manually setting the tile size to the one Blender reports can give further up to 10% speedup, at least on cards with low memory like the HD7770 I used for testing.
Hair and Motion Blur are their, but they must be fixed to work with the selective node code again, so that they only compile for scenes needing them. When it’s fixed, I can recompile it.
Transparent shadows work already, but are not in the selective node code and are not yet optimized as they were long abandoned code. So it slows down all scenes at the moment. You can activate them in the kernel_types.h. On my 80€ card, it makes render about 1,9 times slower. So it’s still faster than 2 weeks ago, but not yet optimal. Test not from me on 285x report no performance impact however.

So I’ll be pleased to see your test to see if it also improves times on higher end cards. Please post times with mentioned benchmarks, official build times and custom build times. Don’t forget to give times without kernel compilation times.
Would be nice to have your viewport render times also.

The build: http://www.file-upload.net/download-10769531/win64-vc-selective-node.7z.html

Test with and without hair/object motion blur/camera motion blur shows following impacts on performance:
Hair+Camera and object motion blur : 36% slower rendering.
Hair: 33% slower
Camera Motion Blur: 8% slower
Object motion blur: 20% slower

So in the end, with this build rendering with all features from 2.75a should be as fast with official build without the features activated :slight_smile:

First test on an old 280X i just bought for 180€ gives:
34s for barcelona scene (GPU-Z reports the GPU is used only at 54% for some reason… but it’s still really fast)
3:34 for sponza
1:03 for BMW (note that here, the perf are the same as with official build)

can you tell for which AMD cards it would work

I got that small one HD8670 D

last time I tried open CL I mean it basically crashed!

thanks for feedback

Lower the render tile size,

and maybe consider getting a GPU that has more than 1 gig of vram

Without the latest drivers.
GPU: R7 250X
Viewport: 3 mins and 44 secs
F12: 6 mins and 10 secs.

With the official build the viewport seems overall slower, no matter what scene i render.
I hope it’ll go into master soon as it is a big improvement.

Install latest driver too as performance improvements adds up.
This build should particulary be helpfull for low end cards as it greatly reduces memory usage. For example, benefits for the 280X in viewport is “only” 3x faster (against 4x for the 7770)

@Ricky, no Idea, i can’t buy every card on the market, so just try and tell me if it works :slight_smile:

I got that small one HD8670 D

This is an iGPU! Your CPU must be an AMD A10-6xxx or A8-6xxx which are APUs to be precise. Those APUs do not have an iGPU based on AMD GCN GPU architecture so it is not supported by Cycles.

Any AMD APU with Ax-7xxx name pattern do have an iGPU based on GCN, wich should work.

I got the AMD A10-6700 APU

and with the opencl from Luxrender it seems to work well at least not crash
and there are options for GPU in the user preferences

but with cycles it still crashes
so wondering if it will ever work with this small video card
at least gives a certain gain in speed
like 200 % would already be very good compare to CPU !

not asking to be as fast as with big GE card here !LOL

I began looking for a GE 750 TI
which is low power and might be faster then the one I have now!
are there better GE card low power available now ?

thanks
happy cl

@RickyBlender, This user has recently bought a GTX 750 Ti:
http://www.blenderartists.org/forum/showthread.php?373976-Graphic-Card

You could ask him to do some tests/benchmarks for you, this way you compare the speed with your CPU.

Edit:
And you also ask him to do some testing with volumetrics in GPU to have a real comparison with the CPU.
http://www.blenderartists.org/forum/showthread.php?375718-Cycles-GPU-CUDA-slow-with-some-materials

For your iGPU, even if it would work , it wouldn’t be any faster as with CPU I guess.
For the low budget graphic card, the Asus 260x can be found new for 119€ here with 2GB and it’s quite efficient at rendering. It will be about 1,2-1,3x faster than the 7770 I used for my tests. So I guess it will take 7min for Sponza, 2min for BMW and 1min15 for Barcelona Pavillon (with my build). But transparent shadows are not yet fully optimized so you will have to wait a bit or use LuxCore.
Whatever you choose, you have to know that whatever you use, about 400MB will be taken on the graphic card (render buffers, openGL scene, kernel, etc…) with OpenCL and about 800MB with CUDA (up to 1500MB from some reports with experimental kernel). So out of 2000MB you get about 1200MB for you scene with Nvidia and 1600MB (33% more) with AMD.
Volumes, SSS and Hair may be slower than with CPU, very probable on low end cards like the one you want.

i’m looking for a sale on that TI card
it would give access to Cuda python which might also be useful

but definitively would like Cuda to get full GPU for cycles
so hope to see a good sale soon

thanks guys
happy bl

I got the AMD A10-6700 APU
and with the opencl from Luxrender it seems to work well at least not crash
and there are options for GPU in the user preferences

Luxrender has been rewritten from scratch as far as i know. But Lux proofes that it is possible to use AMDs Terascale-2 architecture (your iGPU has this architecture!) for OpenCL rendering. But i am pretty sure Cycles wont support in the near future!

so wondering if it will ever work with this small video card
at least gives a certain gain in speed

Basically you do not have an video card! You have an iGPU, thats part of your CPU.

I began looking for a GE 750 TI
which is low power and might be faster then the one I have now!
are there better GE card low power available now ?

I do have and use this card. Its pretty small so it fits into my mini cuby PC and does consume a maximum of 70 watt.
I need 03:25.20 for BMW27.benchmark
NVIDIA is planing a GTX 950 which could be a little bit better than a GTX 750Ti but i think you would go well with an GTX 750Ti for 110-120€ (look up whats this in $).
Just make sure you buy one with 2GB GDDR5.

@bliblubli: does your build support NVIDIA CARDs? I was Trying to render with my GTX 750TI 2GB but got a crash!

@JulianS no it doesn’t at the moment because OpenCL perfs by Nvidia are even worse than the old ones from AMD. Nobody with Nvidia card will seriously want to use OpenCL at the moment (maybe only for the memory as cuda with all GPU features eats alone the 2/3 of your 2GB memory on 750Ti).

Hi @bliblubli of https://developer.blender.org?

Cheers, mib
EDIT: I don´t add these links, only copy, paste???

you should make a bug report for this
and I think CMAKE might not be there for a long time fi I remember well!

happy bl

As all people mention in the link I give, you can’t report that, because it’s random. You do twice exactly the same things, one time it crashes, the second one not. It can be due to some wrong optimization parameters (like fast math, -03 instead of -02, whatever), nothing directly in the code. The only thing you can write in the report is “Blender randomly crashes”. You can do it if you want by mentioning the link I gave to show many people have the exact same problem on very different machines and with most certainly very different uses and workflow, but it will most certainly not be accepted as valid. But as I said, try if you want :slight_smile:

as was indicated in other thread
even if random bug dev can still test it
take a chance and see what happen !

happy bl

Here it is https://developer.blender.org/T45509

well like they say
keep going and try to find a way to reproduce it then report again

Dev are busy and some don’t like it but still have to be reported cause someone else might know how to reproduce it

happy bl

give US faster transparent shadow and all is gained ! transparent shadow are the most needed !