Performance Sufferage

Hello,

I just purchased an iMac and love the rendering performance with CUDA. I do have a concern (and question) about the performance of my computer though. While rendering in CPU mode, the computer operates (while using other apps and web surfing) as if its not encumbered. Then while using GPU mode, everything slows down to the point where it is very difficult to use (keystrokes included). I’m under the impression that the opposite should happen when rendering in those modes. Why would my computers performance suffer while rendering in GPU and be fine when in CPU?

If it matters Im using OSX and Blender 2.72.

Thanks!

It’s likely because the GUI’s of both Blender and parts of your OS are drawn using the GPU. CUDA rendering will use all of the juice that your GPU can put out so you get an unresponsive machine.

The most straightforward way of avoiding this is by getting a second GPU and do the rendering with just one, but iMacs don’t really allow you to customize the hardware setup so you will need to find a way to force a lower priority to CUDA operations.

What Ace Dragon said is probably right on the money, but in addition to that you likely have a lot less video memory than system memory. One thing you can do to free up some of this memory is under the Performance roll out menu in the Render options pane. Lower the resolution of the rendering tiles and disable Progressive Refinement. I believe this will ensure that only the portion of the render currently being worked on is stored in video memory freeing up some additional memory for other operations.

In the Performace roll out you can lower the tile resolution by editing the X and Y values under “Tiles.” These values refer to the resolution of each subsection of the render being worked on. So if you’re rendering a 1024x1024 image and set the tiles to 128x128 then the render will be divided into 64 sections of 128x128 pixels each. When Progressive Refinement is off the renderer will work on each individual tile until it is done, until the number of cycles for that particular tile has reached the amount set in the Sampling roll out menu, then move on to the next tile. When Progressive Refinement is on the renderer will, I believe, iterate through every tile rendering one cycle per tile then reiterate over all of the tiles rendering another cycle per tile.

P.S. My system does not support CUDA so this is coming from someone that’s never used GPU rendering.

P.P.S. When using CPU rendering the renderer will work on more than one tile at a time if you have a multi-core CPU. The number of tiles being worked on will be equal to the number of CPU cores you have or the number you’ve defined in the “Threads” option under the Performace roll out menu. By the by, manually setting the number of threads to something higher than the number of CPU cores in your system will not give any increase in performance, if anything it might degrade performance because there’s some overhead involved in running multiple threads. The option probably exists if you want to run fewer threads than your number of cores.

As I understand it running two threads per core can show an increase in performance, but this is rather dependant upon the application. Blender’s rendering threads are, as I can imagine, quite process intensive so one thread per core is probably the best you’re gonna get performance wise.

Rendering on the GPU means that you throw all available resources of your card at rendering the image, while having it do something completely different simultaneously: Displaying Blender and running the OS’s user interface. That’s not going to work without the latter suffering heavily.

In addition to that all iMacs use essentially notebook-class graphics cards (GTX 7xx “M”), which are nowhere near as powerful as their desktop counterparts. In fact, I’m not entirely sure that the graphics card in your iMac will show a significant performance gain over rendering on the CPU (depends of course on the CPU you ordered).

@ikariShinji Mike Pan’s benchmark renders out at 1:01 on GPU (780 GTXm 4GB) and 1:38 on CPU (3.5 i7), so there is enough of a difference to use GPU.

@atr1337 I’m using the auto tile size addon for tile size. It does automatically change the tile dimensions when switching between devices. I’ve noticed slight differences in time when manually testing different tile size combinations.

@Ace Dragon As you mentioned, a second card isnt going to be realistic for me. A Google search for CUDA priorities revealed lots of techno babble that is beyond me.

Does anyone know of a site with relevant info on how to prioritize CUDA operations?

The iMacs in general are usually not designed for heavy computing tasks like rendering in most configurations (they have the Mac Pros for that). They’re not really designed for 3D work, the intent is that it’s used for general multimedia tasks for a few years before being replaced by a more powerful machine (since there’s no hardware upgrading with these things either). If your CPU is on par with an i5 or lower, you might have trouble making good use of Cycles for complex scenes until it gets optimized a bit more.

@Perrishnikov: I was rather unaware that there was an addon for automatically sizing the render tiles. Forgive me if I’m wrong, but is it perhaps automatically sizing the tiles so that the render is evenly divided by the number of render threads? Meaning that if you have two render threads then the addon will auto size the render tiles so that the render is divided into two tiles?

If so turning off that addon, turning off Progressive Refinement and lowering the resolution of the tiles so that there are more tiles than the number of available render threads would probably help you to save video memory. That’s not to say that saving a little video memory will or will not make a large difference in the performance of other applications.

My guess is that the addon you’re using is about ensuring your render runs at max speed and matching the number of render tiles to the number of available render threads is one way to do that, but you’re not looking to maximize the performance of your render, but instead increase the performance of other tasks running while you render and in order to do that you’ll need to lower the performance of your render so that other applications can then make use of what’s left over.

Since in order to increase the performance of tasks running alongside Blender you’ll need to lower the performance of your rendering task you might be better off just using CPU rendering if you plan on multi-tasking and just use GPU rendering if you’re not going to be using your computer while it’s rendering. When I have a long render ahead I usually just let my computer render over night whilst I sleep.

Edit: Oh yeah, check out the specifications for your GPU, if it has more than one core then you can setup Blender so that it will not use all of the GPU cores. In the Render options pane under the Performance roll out menu where you see the “Threads” option, change this from “auto” to “fixed” and then enter a number that is less than the number of cores available on your GPU. So if you have two cores enter 1 and this will free up the second core for other tasks.

The Addon can be found here. I’m not quite sure how it determines the sizing.

I’ll play with the settings some more to see if I can get some better performance during rendering.

Thanks for the advice!

As a side note tile sizes that produce the best performance on one render will not necessarily produce the best performance on a different render.

If you’re rendering a scene that is equally complex on the left as it is on the right on a dual core processor then dividing the scene into two tiles width wise should give the best performance. If the scene is more complex on one side than it is on the other then that same method will not yield the best results as the thread rendering the less complex side will finish first leaving the rest of the work to be done by just a single core.

In that case dividing the render into more tiles will help divide the work more evenly among the two cores. For the most part one of those cores will proobably finish first, the trick is to get the time difference to be as little as possible so that the work is divided as evenly as possible among however many cores you have.

Thanks, I did some research on command line rendering and I’ll be giving that a try later too. Hopefully that’ll help render and overall performance.