Tests with GTX 780 vs GTX 580, tile sizes, and why the Benchmark doesn't work.

Ok, I’ve been playing with my brand new dual GTX 780 rig and comparing it with the old dual GTX 580s. And here are some of my conclusions:
The 780 is indeed faster, way faster. For those questioning the benchmarks please read on.

Unmodified benchmark results (tile size 120x67)
GTX 580 (2 cards working in tandem) 00:40.67
GTX 780 (2 cards working in tandem) 00:44.75
All four cards (2x580+2x780) 00:23.87

These numbers are consistent with Tom’s Hardware reviews


and make the extra bucks spent on new hardware seem like a waste.

But that is quite literally judging a book by the cover…

Modified tile sizes on the benchmark to 240x540:

GTX 580 (x2): 00:31:50
GTX 780 (x2): 00:23:95
All four cards (2x580_2x780) 00:18.59

My feeling, and feel free to correct me, is that the small tiles are not enough to harness the main feature of this new cards: more CUDA cores. It is clear to me that the advantage of parallel processing is not showing in this ONE-SIZE-FITS-ALL benchmark… The process needs to be rethought in my humble opinion.

Now, for some of us who are not rendering BMW images for a living and have to meet delivery deadlines, here’s another example of a project I just rendered:

The same 1080x1920 frame in Cycles with 200 samples rendered in tiles of 480x270:

Dual Nvidia Tesla M2050 (rendered in renderstreet.com) 10min 40sec
GTX 580 (x2) 6min 47sec
GTX 780 (x2) 3min 25sec
all four cards (580x2+780x2) 2min 23.27sec

This particular animation was over 200 frames, so do the math and tell me the 780 is not worth the price…
Just for the record I chose the GTX 780s because I could buy two of them for the price of one GTX Titan.
Happy Blending! (and Happier rendering!)

Hey Cegaton, great test, good to see these cards compared.

One thing to note regarding the RenderStreet test is that the tiles are always forced to 256x256. That worked best in our setup.

Also, for completeness, a 200 frames animation would render a lot faster on a farm than on any individual machine, since a farm can field multiple machines to render separate frames.

Thanks for testing, I’m already annoyed at the Tomshardware benchmark showing up every so often. It claims to have used 200x200 tile sizes (and they must’ve used them when benchmarking the Titan) but the result clearly isn’t consistent with anything else.

When the BMW benchmark was started, Blender didn’t use explicit tile sizes, but dividers. The current benchmark tile size is just a remnant of that, the Blender default (64x64) is in fact even worse for GPU rendering and it’s unfortunate that users must be aware of that to use their GPU properly.

Thanks for the benchmark!
I thought that changing the tile sizes the improvement was linear by comparing between GTX models.

So, what should be done with the BMW benchmark?
Maybe should start from scratch with the benchmark with two files: BMW_CPU.blend and BMW_GPU.blend. With the values ​​of the tile sizes optimized for CPU and GPU respectively by default.

So it seems that every graphics card has its own tile size, where its GPU has a performance maximum?
So a benchmark comparison between two different GPUs with the same tile size makes no sense, because even if a higher tile size is used, it is not optimal for each GPU?

So it seems that every graphics card has its own tile size, where its GPU has a performance maximum?

The most important thing is to set it larger than the default. 256x256 seems to be a good rule of thumb. There may be variations with different GPUs, but it probably doesn’t matter that much.

Recently it was mentioned that the fastest rendering would be 1 tile for single gpus… but rendered either via commandline or python… test it out… enable the stamp… type in bpy.ops.render.render() into the python console… blender will go unresponsive for some time whilst it renders.

If you can test a single 580 vs a single 780 using this setup this would be much appreciated.

I would be interested to see this as well.
It’s a huge difference in rendertime when using console with 1 tile

Here’s what I got:


However, I can imagine that with huge scenes and resolutions this method can be very prone for crashing Blender.