MASSIVE blender GPU problem

This sounds a bit dramatic , but the problem is very serious actually.

Since blender 2.72 Cycles feeds on GPU RAM, it simply cant be statiesfied.
As an example, many smaller artist with low budget can only affort small GPU’s with 2gb VRAM, the problem is blender alone to render a frame containing only 1 triangle needs at least 1.8 GB of that.
But that would not be a problem for big cards right ?
Like 980ti, wiht its 6GB right ?
Wrong, the more stuff you add, it will require an expotential amount of VRAM
I rendered a simple Triangle = 2021 MB VRAM required
I render a more complex scene = 4452 MB VRAM, eventhough on CPU its about 200 MB

The problem is , its not only _ X + amount of memory for the scene
but instead its more like K x (X + amount of memory)

X = vram for kernels, and blender, and cycles as basis
K = Mulitplier for expotional function

Maths beside, its a serious problem and after research there is a fairly large amount of ppl effected, and quite a lot of them cant render GPU anymore , simply CUDA_OUT_OF_MEMORY exception

I do hope there is a fix for it, and after deeper research, some blender coders blame Nvidia for it, whereas Nvdia support blames blender for it. But really, its blenders fault, Nvidia didnt shake around CUDA, but rather blenders new functions require huge amoutns of VRAM even when not used.

If you get a out of memory execption please check if this issue is the cause

Everybody are waiting for the arrival of kernel split for CUDA and hoping it solve part of the problem:
https://developer.blender.org/T43310

I do not know about your tests and formulas. In my simple tests I’ve done, CUDA consumes about 800MB of extra vRAM compared to what Blender reports on memory usage.

I clarify that currently I can only do tests starting from 2.73 which is when Blender start supporting cards with Maxwell architecture (curiously coincident). And seeing the release notes of Blender 2.73, I could not find these Cycles new features that people are talking and supposedly makes Blender 2.73 use more vRAM compared with 2.72.

By any chance are you setting the tile size to the size of the entire frame? Try a smaller (i.e. 256x256 or smaller) tile size. I’m currently running with a 2GB GPU and can render scenes containing millions of triangles.

You are correct, Cycles is a GPU RAM hog vs CPU rendering and it will require relatively more. It’s just not that bad in my experience.

How did you measure this? Did you really use a GTX 980Ti?

I tried reproducing this with a 3GB GTX780. On GPU-Z I have 1.5GB memory used before even starting Blender. Rendering just a triangle creates excess usage of ~1GB (which admittedly is a pretty large baseline), but rendering 3 million triangles took only 1.5GB, so it doesn’t need exponentially more memory at least for geometry. I didn’t test textures. Also keep in mind that the experimental kernel needs more memory.

Assuming that there aren’t any bugs in Blender’s memory allocation code, the likely explanation is that this is memory required to run the kernel code itself, which is allocated by CUDA. That’s really outside of Blender’s control. Of course NVIDIA can just say the program should be smaller (or split into smaller parts), but that’s easier said than done. Maybe they should send over some engineers to work on that, like AMD did.

With 6gb 780 i get cuda out of ram error at around 4.9gb use (peak) using command line rendering with 2 cards in system rendering in any configuration (1+1 or either individually). Also as soon as it has “peaked” it drops 70% when it actually renders. I am forced to render FG and BG separately as a result which is not a problem just small hassle. I am very guilty though in using experimental render + quite a lot of passes. Geometry is dense(but optimized), Textures are heavily optimized but still was a tug of war to get it rendering.

I must stress that i absolutely love working with Cycles, just if any memory specific enhancements can be made then it would surely help everyone.

EDIT: Relating to what BeerBaron said, would having a weak primary GPU save up resources from one that you’re trying to render with (for RENDERSLAVEs especially)?

Using multiple high resolution displays attached to the card can use a lot of VRAM.

There is one other possible avenue of making the kernel smaller, dynamic feature selection + run-time compilation can be used to make a smaller / faster and less feature complete kernel when the scene allows for it. It will add some compilation time for each time you change the combination of features however.

As others have said, part of it is just the nature of GPU rendering (even the Octane devs. had to deal with users reporting high memory use as they added advanced shading features and this is with some of the best developers in that area).

There are indeed plans to implement split-kernel functionality for CUDA as well, so it will take a bit of pressure off as more functionality is added.

Usually it’s textures that take up the most vram. Once Cycles has out-of-core capabilities it should help out a lot as well.

Since blender 2.72 Cycles feeds on GPU RAM, it simply cant be statiesfied.
As an example, many smaller artist with low budget can only affort small GPU’s with 2gb VRAM, the problem is blender alone to render a frame containing only 1 triangle needs at least 1.8 GB of that.
But that would not be a problem for big cards right ?
Like 980ti, wiht its 6GB right ?
Wrong, the more stuff you add, it will require an expotential amount of VRAM
I rendered a simple Triangle = 2021 MB VRAM required
I render a more complex scene = 4452 MB VRAM, eventhough on CPU its about 200 MB

The problem is , its not only _ X + amount of memory for the scene
but instead its more like K x (X + amount of memory)

X = vram for kernels, and blender, and cycles as basis
K = Mulitplier for expotional function

Maths beside, its a serious problem and after research there is a fairly large amount of ppl effected, and quite a lot of them cant render GPU anymore , simply CUDA_OUT_OF_MEMORY exception

I do hope there is a fix for it, and after deeper research, some blender coders blame Nvidia for it, whereas Nvdia support blames blender for it. But really, its blenders fault, Nvidia didnt shake around CUDA, but rather blenders new functions require huge amoutns of VRAM even when not used.

If you get a out of memory execption please check if this issue is the cause

http://www.paradisepackers.com/ best list of packers and movers in chennai

Paradise Packers and Movers in Chennai directly call us at +91-9361199911, get professional, cheap, household, local and best packers and movers in Chennai. Packers and Movers in Chennai, India, Oct. 20, 2015 – Paradise Packers and Movers Chennai, a leading provider of Household shifting, office shifting, transportation, local move and moving services primarily focused on small and mid-sized customers. <a href="http://www.paradisepackers.com/
">Packers and movers in chennai</a>

Each rendering thread needs its own local memory for things like the current ray, path state, shader variables, etc.

Where on a CPU you have typically between 4-48 threads, one or two threads per physical core. Current GPUs run with up to 3072 cores (Quadro M6000), where for optimal occupancy, you want to run many more threads than you have cores. GPUs render with easily 100 times as many threads as a GPU, requiring 100 times more per-thread memory than CPU rendering.

Indeed, Octane has a great out of core solution :

Would be a dream for cycles .
Right now I’m working on a character with UDIM textures and even with my 12Go Titan X I got some surprises :confused:

Yes i did test it with a 980ti, lucky me that i have one, and at first i was really suprised and i cant still figure out why the memory grows exponatially even with supported set, and no complex materials like volumes or translucency.

The kernels really should be split because this is just a hard hit for small GPU’s, i know that for some reason not everybody has this problem, there is people with same setups who dont run into it, but GPU’s like 960 or older are struggling for some artist.

The only idea that makes sense is having diffrent itterations of CUDA makes a diffrence, but i changed versions yesterday, and it doesnt make a noticable diffrence with all the tests, a few mb’s but that can be just reading erros since blender doesnt run alone on the system . (eg, Desktop and OS )

I guess that the best solution would be if somebody has this problem, wait for OpenGL acceleration, and then use OpenGl instead of CUDA, it will be a bit slower i predict, but not too much and at least the gpu can render then

However i dont know how big opengl kernels will be, but i reckon they are gonna be smaller since there is much more freedom in OpenGl dev than in Cuda allowing the blender devs to work with more

It sounds to me like there is something odd with your setup. a single triangle shouldn’t be using 1.8 gb of video memory. Do you have a bunch of giant monitors hooked up to your videocard? is there something hinky in your world environment? I think more troubleshooting is necessary before giving up and waiting for the kernel split, or openCL to be more feature complete.

Hi.
This exponential vRAM growth that you have is weird. You rename the Blender USER configuration folder in your operating system (eg 2.76 to 2.76_old if you are using 2.76), and you test again:
https://www.blender.org/manual/getting_started/installing_blender/directorylayout.html

By the way, it is OpenCL, not OpenGL. But anyway nVidia users are waiting for Kernel Split for CUDA, not OpenCL.

Hi, checked with my second GPU not connected to a display with nvidia-smi on Linux give me
510 MB for a single triangle.
If I add 3 cubes it is 511 MB.
Do you still use 2.72 as mention in first post?

Cheers, mib
EDIT: Checked with Blender 2.72 give 390 MB for default cube so even less.

Moved from “General Forums > Latest News” to “Support > Technical Support”

Really struggling with deadline. I have 2x 780 6GB cards, neither of them (nor both together) can render this running out at 3.9? Latest Blender 2.76


PS. to avoid confusion, in this instance i spawn 2 Blender threads in python, each with own dedicated GPU. This is because BHV building and scene loading in general takes up to 2 minutes so I keep CPU busy while GPUS are rendering. It should not be source of error unless RAM is issue, it is battle tested.