SSS Test CPU (i7-5930K) VS GPU (Titan X) Whaaat the.... ???

Hi guys !
I found my character with SSS was really slow to render on GPU and i wanted to make a test with an other scene in order to compare.

So I made a little test with this great and optimized scene from blendswap (author : danikreuter)
http://www.blendswap.com/files/images/2013/08/blend_69585/danikreuter_render_medium_8926f9c215bd673758c3580f6e933994b53327f8.jpg

I just hit render without changing anything. The result is scary.

CPU i7-5930K :
22 secondes

Titan X :
1m26

Is my titan X sick ? Is SSS so bad optimized on GPU ?

Could you try to render and tell me if you have the same kind of result ?

Thanks,
Seb

EDIT : Maybe this topic should be in the rendering section ?

Well, render time for that file on my old GTX 680 is about 29 seconds.
Only thing I changed is the tile size - which you should always adapt to your render device, btw (just let the Auto Tile Size addon do this for you).

If you did use a proper tile size and the Titan X still performs that badly, perhaps you forgot to disable SLI?

Hi IkariShinji,
Thank you for your answer :slight_smile:

Yes Auto Tile Size is enabled by default.
I just have 1 Titan X, so SLI does not exist.

Mmmh I’m wondering what’s going on with this card. GPU Z is telling that temperature is OK…
And driver is uptodate

i found a thead :
https://developer.blender.org/T44903

Performance are really going down for titan X when passing from supported to experimental…

It seems that for SSS, GTX 780 and GTX 970 are way faster :confused:

EDIT
Never mind: Seems I confused the Titan X with the Titan Z…

Mmmh are you not mixing up with the Titan Z ? (which is composed of 2 GPUs )
I don’t see any SLI parameters in the driver settings.

I did the test on my 2 comps, with Auto Tile Size enabled:

iMac 2012 (core i7 3770 @3.4ghz / 32gb of RAM / Nvidia 680MX 2gb / OSX 10.11.3)

  • CPU (tiles 30x30): 00:28.64
  • GPU (tiles 180x225): 00:37.03

PC (core i7 4970K @4.0ghz / 32gb of RAM / 4x Nvidia GTX970 4gb / Xubuntu 14.04)

  • CPU (tiles 30x30): 00:18.97
  • 1 GPU (tiles 180x225): 01:05.34
  • 3 GPU (tiles 180x225): 00:32.72
  • 4 GPU (tiles 180x225): 00:18.85

Yeah, 4 GTX970 are faster than one core i7 4970K! :ba: :ba: :ba:

Hm. Not quite sure why an ancient GTX 680 outperforms 3 GTX 970s…?

It’s a crazy story O_o

I think it’s a real issue but it looks to be fixed in the latest nightly build where they have moved GPU SSS out of experimental, my understanding is experimental was using more memory and was known to be slower)

File at default settings this is what I got,

win 10 64 gig (4 gig dim x 16)
titan x x 2 -
Current build (experimental) 45.85 sec (180 x 225)
Nightly build (not experimental) 17.58 (180 x 225)

Xeon E5-2643 4 core 8 threads each 3.30 x 2 32x32 - 22.27 secs

Wouaw, thanks for this great news :slight_smile:

Just tried the last daily build.

Titan X before :
1m26

Titan X now :
32 sec

CPU i7-5930K :
22 sec

So it’s clearly best ! But still slower than CPU haha :ba:

Edit :
By the way, the new Christensen-Burley algo looks faster and way more accurate !!!

I did the test again, with today’s builds (feb 8th), with Auto Tile Size enabled again:

Before:

After:

iMac 2012 (core i7 3770 @3.4ghz / 32gb of RAM / Nvidia 680MX 2gb / OSX 10.11.3)

  • CPU (tiles 30x30): 00:26.71
  • GPU (tiles 180x225): 00:29.43

PC (core i7 4970K @4.0ghz / 32gb of RAM / 4x Nvidia GTX970 4gb / Xubuntu 14.04)

  • CPU (tiles 30x30): 00:18.47
  • 1 GPU (tiles 180x225): 00:30.14
  • 3 GPU (tiles 180x225): 00:15.34
  • 4 GPU (tiles 180x225): 00:09.27

That’s better :slight_smile:

Some more fixes and speedups to the Christensen-Burley code were commited today, so tomorrows buildbot should be even a bit faster.

Awesome news, thank you :wink: