Loki Render 0.7.0 released!

samuraidanieru · November 18, 2014, 2:54pm

That would be one way to avoid extra file shuffling across the network, though it seems like the simpler solution is setting up a custom AMI with everything needed already in the AMI. Then you simply launch as many instances with the AMI as needed.

As I suspected, passing in and accessing user data to the instance is very simple. Amazon’s help page on the topic is here.

In short, just pass the master IP address via the ‘user data’ text box before launching the instance. Once inside the instance, you can then place the IP address in a variable with this command:

$ masterip=curl http://169.254.169.254/latest/user-data

This variable can then be passed to Loki from within a Loki startup script. assuming all your paths are defined in variables, you could then do this:

$ java -jar $lokiPath $blenderPath $masterip

Need to get to bed now, but very close to getting a ‘works out of the box’ grunt AMI put together. Would still like to read up a bit on Amazon’s suggestions/best practices for prepping an instance for snapshotting and creating an AMI to make sure I’m not missing anything, like here.

Also, it looks like Google Cloud Engine has very competitive pricing, so I’d like to take them for a test drive as well.

zeealpal · November 18, 2014, 4:39pm

Just checked the EC2 pricing page, the cost of data from internet -> EC2 appears to be free.

DMRadford · November 18, 2014, 8:19pm

@Hoverkraft, @Samuraidanieru, @zeealpal:
Y’all seem to be confused about my methods. I’m creating a custom AMI. The steps I posted the other day are only for initial setup. I’m dictating my IP address in the setup so I don’t have to pass it though AWS user input. After that is set up, create an AMI of the fully set up grunt node. Then from the AWS dashboard, you can just launch the AMI with as many instances as you want with no user input and they all connect. All the install and setup information is saved in the AMI so it’s not re-downloading loki and blender on each instance as it boots. Even if it was though, there’s no charge for inbound data, only outbound. That being said: There’s zero data charges anywhere except for transferring the images back to the master. If you still qualify for free tier, there’s no data charges anywhere unless you are doing a metric ton of rendering and exceed 15GB of outbound data (png files or whatever you output) in a SINGLE month.

@samuraidanieru: I am noticing one big bug though. When I cancel a job or a grunt crashes, it leaves files in the tmp folder. If I start the job again, or any job after that, it crashes upon trying to save the image into tmp. This is a major issue since the only way to fix it is to manually log into each grunt machine and delete the temp files. Or, in the case of ec2, you have to terminate the instances and re-launch them unless you want to SSH into each instance. Terminate then relaunch is easy enough, but since partial hours are billed at the full hour rate, it would suck to have to do this after only a few minutes with any of the paid instances.

Would it be doable to make the grunts always clear the tmp folder when they go idle? Or have a clear tmp function from the master?

-David

Hoverkraft · November 18, 2014, 11:52pm

Ok I see. It’s just that, at least with Brenda, I often had the need to supply a new Blender version or some add-ons and setting up a new AMI for that was a hassle. But that is probably a minor concern to most.

Anyway, thanks for the work on supporting AWS, this makes Loki just that much more useful, not everyone has the hardware to set up a renderfarm at home.

DMRadford · November 19, 2014, 3:26am

@samuraidanieru: I’m trying it on Google Cloud Engine as well and I’m having some trouble. I’m not sure if it’s something different between ec2 and GCE, but when I run through the same setup I do on ec2, I’m getting this error:

Exception in thread "main" java.lang.UnsupportedClassVersionError: net/whn/loki/common/Main : Unsupported major.minor version 51.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: net.whn.loki.common.Main. Program will exit.

Did I somehow install the wrong version of Java or is Debian being picky about something in Loki’s code?

-David

samuraidanieru · November 19, 2014, 8:10am

DMRadford:

@samuraidanieru: I’m trying it on Google Cloud Engine as well and I’m having some trouble. I’m not sure if it’s something different between ec2 and GCE, but when I run through the same setup I do on ec2, I’m getting this error:

Exception in thread "main" java.lang.UnsupportedClassVersionError: net/whn/loki/common/Main : Unsupported major.minor version 51.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:323)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:268)
Could not find the main class: net.whn.loki.common.Main. Program will exit.

Did I somehow install the wrong version of Java or is Debian being picky about something in Loki’s code?

-David

What version of Debian are you using? Debian tends towards very high stability which of course is a good thing, but the trade-off is older packages. Loki requires Java 1.7 or later and you probably have 1.6 on the system.

If you try Ubuntu you’ll get more recent packages.

samuraidanieru · November 19, 2014, 9:00am

Sure. You’re creating a custom AMI that will work for your setup. The reason I mentioned passing the master IP through with user data earlier is because I want to make a generalized AMI that anyone can use.

I really appreciate you reporting this. If you can provide more detail hopefully we can resolve it. If you can tell me:

What Loki version you are using
In the ‘cancel’ case, do you mean you’re right clicking on a grunt from the master and saying ‘quit after task’ or are you aborting the job from the grunt side?
In the crash case, how often is this happening? Do you have steps to repro?
For the crash, can you send the log? Hard to troubleshoot unless I can see the exception that Java is throwing.

What I can do is add code for the grunt to always wipe the tmp area before starting a task. This will certainly solve the ‘crash due to previous crash’ problem, but like I said above, I’d also like to squash the bug that’s causing the crash in the first place:-)

samuraidanieru · November 19, 2014, 9:05am

Yep. Data coming in is free, but they make sure you pay when you take data out. Makes total sense from a business perspective: Provide incentive to draw customers in, and then keep them there.

“‘Come in, come in. Make yourself at home!’ said the spider to the fly.”

DMRadford · November 19, 2014, 10:29am

@samuraidanieru: Having the grunts wipe the tmp folder before starting a new task would solve the issue. There is no initial crash really. If I cancel a job, either from the master or by closing the grunt, that’s when files get left in tmp. I don’t think this can be avoided in all cases since if the process itself crashes for any reason, it can’t wipe the folder anyway since its crashed. Having the grunt clear tmp before starting a task sounds like the way to go. However it should be noted that doing EXACTLY that would cause issues if master and grunt are running on the same machine with tile rendering enabled. Since the master also uses tmp for the pre-compiled tiles, having the grunt on the same machine clearing tmp before each task would be highly counterproductive. Perhaps the master could use a 2nd tmp folder? mtmp? Same possible conflict though: What happens if the master crashes or is shut down while it still has files in mtmp? Should the master clear mtmp at each launch?

As for the issues I was getting on GCE:
I’m using Debian-7-wheezy-v20141108 according to GCE.
Good call on older Java version, I checked and it is 1.6.
So note to GCE users, do not install Java with “apt-get install default-jre”.
Instead, use:
“apt-get install openjdk-7-jre”

Also for GCE, bzip2 isn’t installed by default so you’ll have to install it before you can extract Blender.

Otherwise the setup instructions are identical from my tests. So in closing, I have it working on GCE now as well as EC2
Still working out the kinks in GCE for mass deployment, but I’ve got an image that launches and auto-connects just like in ec2.

-David

samuraidanieru · November 19, 2014, 5:49pm

DMRadford:

@samuraidanieru: Having the grunts wipe the tmp folder before starting a new task would solve the issue. There is no initial crash really. If I cancel a job, either from the master or by closing the grunt, that’s when files get left in tmp. I don’t think this can be avoided in all cases since if the process itself crashes for any reason, it can’t wipe the folder anyway since its crashed. Having the grunt clear tmp before starting a task sounds like the way to go. However it should be noted that doing EXACTLY that would cause issues if master and grunt are running on the same machine with tile rendering enabled. Since the master also uses tmp for the pre-compiled tiles, having the grunt on the same machine clearing tmp before each task would be highly counterproductive. Perhaps the master could use a 2nd tmp folder? mtmp? Same possible conflict though: What happens if the master crashes or is shut down while it still has files in mtmp? Should the master clear mtmp at each launch?

Good to know you’re not talking about actual crashes, but rather just aborting. Concerning when to clear tmp, I followed a similar thought process earlier today:-) The solution I settled on is to clean the tmp directory on Loki startup, and only then. That should solve the problem without the master or grunt stepping on each-other’s toes! I’ve made that change and it’s in the latest release (0.7.1.c) which I’ve put up on SF.

That’s awesome! I haven’t had time to look into GCE yet. Feel free to share your experience on the wiki;^) Have you done any comparison of relative performance? I’m particularly interested to know how Amazon’s compute nodes compare with Google’s compute nodes, core-to-core.

I finished the public AMI, so now anyone can use it on AWS. The master IP is specified via the ‘user data’ field when the instances are created, so now one can launch a Loki render farm on AWS in just a few minutes:-) No need to login to the instances and set anything up; the instances connect and can work in the farm right away.

For those interested, the AMI is named ‘’LokiGrunt-ver1.0_LokiRender071c_blender272b_', and the AMI_id is ‘ami-e20fb995’.

I put up a Howto page on the wiki with steps to use the AMI here.

DMRadford · November 19, 2014, 8:20pm

I haven’t gotten a chance to do any real comparisons yet. The ‘free trial’ of GCE limits you to only 2 cpu cores at a time :/. So, one dual core node or two single core nodes. I’ll probably bite the bullet soon and do some real tests. I’m waiting until I have a noteworthy job to test it on. Since I get micro ec2 nodes free still, and up to 20 of them, it doesn’t make sense for me to pay for GCE at this point. However, the minute I have to run a job that requires more than 20 processing cores (or just more cores per machine) I’ll definitely be testing it out since GCE is about 20% cheaper than ec2. The jury is still out on all that though, since we currently don’t have any data on actual compute power for the price.

-David

Hoverkraft · November 20, 2014, 1:56am

I took a quick look at GCE and am under the impression that if you consider spot instances AWS is (for now) much cheaper to render on than GCE?

Speaking of which, an important question: What does Loki do when a render node shuts down while rendering a frame?

Hoverkraft · November 20, 2014, 2:08am

And a question or possible feature request: Is it possible to review and edit the command that Loki issues to the grunts? I often like to submit some python script to blender eg. " blender -P script.py -s 10 -e 15 "

DMRadford · November 20, 2014, 11:35am

@Hoverkraft: That would be correct. Though for any projects with any priority other than “Meh, it’s done when it’s done”, I would recommend setting your bid equal, or close, to the same price as a dedicated instance. EC2 spot doesn’t charge for hours they terminate. ie, if an instance runs for 58 minutes then the price exceeds your max bid, causing the instance to terminate, you don’t pay for those 58 minutes (from what I can tell). That way you’re leveraging the lower costs of the spot market, potentially ‘free’ render time if instances get terminated, and your hourly cost per instance never exceed dedicated prices.
Also, what sorts of scripts are you wanting to push, and what for? I’ve never done this before.
Since blender is already an argument for loki, you can’t pass it additional arguments from what I know. Perhaps putting them in quotes: "java -jar loki.jar “./Blender/blender -P script.py -s 10 -e 15” 00.000.00.00? If that works, you could set up a variable to be fed through user input, but the script would have to already be part of the AMI unless you had it linked up to S3 and were pulling scripts from there. Sounds like a lot of hassle.

If a grunt terminates mid-frame, Loki just gives it to the next available grunt. It doesn’t cause any issues from what I can tell.

-David

DMRadford · November 20, 2014, 4:55pm

Wait on this, still testing, might not be the case (arguments box would still be nice though regardless).

@samuraidanieru. Found another bug for you, a bit of a show-stopper for any farm bigger than a couple machines. Loki doesn’t appear to tell the grunt’s which render engine to use: Blender Internal or Cycles? It just uses whichever engine is set as default for Blender on each machine. This needs to somehow be set in the job settings. Would there be a way to add a text box in the job settings to feed specific arguments to blender via the grunts? Ie, if I started using a new addon that wasn’t enabled by default, or feeding python scripts like @Hoverkraft was asking about.

-David

Hoverkraft · November 21, 2014, 12:01am

I’m coming to this from a brenda mindset where you have complete control over the command that is run on the node, you could even execute some other program instead of Blender. But I am willing to forfeit some flexibility for usability.

The python snippets I call are kind of pre-flight scripts that make sure everything is set correctly for final rendering (DOF enabled, no border, resolution, render engine, samples count, render layers, you name it). So this is pretty essential for me. Plus, on your own farm, you can inject some code to enable GPU rendering on the nodes, which doesn’t “stick” by default.

Though there might be a way to auto-run a script on scene load instead of parsing from the command-line. One would have to disable Blenders python security check first.

Thanks for the pointers on the spot market. Cloud rendering with Cycles is THE main Blender USP for me, nothing commercial can compete. Dirt cheap, hassle free 300 node renderfarm with state of the art pathtracing quality? Sign me up!

samuraidanieru · November 21, 2014, 1:39am

It’s easy to compare core counts, but without actual tests, it’s just speculation about which provides a better compute to price ratio. From what I’ve heard, performance can vary quite a bit, even with the same type of virtual machine. This makes sense since all these virtual machines are on real hardware that’s being shared with other virtual machines. When I can, I’ll try to run some comparative tests.[/QUOTE]

Like David said, it’s no problem. If Loki loses a grunt that is busy with a task, it assumes the task is lost, resets the task status to ‘ready’, and passes it out to another grunt as soon as one is available. I added this behavior in 0.7.1 specifically so that Loki can handle an environment like Amazon with Spot instances, where an instance can just be terminated at any time without warning.

samuraidanieru · November 21, 2014, 1:56am

Actually, rendering configuration is taken from the blend file. You can verify this by taking a blend file and saving internal and cycles file versions respectively, then passing both through Loki and looking at the output.

Aside from Loki, this makes sense from Blender’s perspective. It would be pretty annoying for users if they specify render engine ‘x’ in their blend file, save it, and then when they render from the command line, it chooses render engine ‘y’, ignoring their config. It is possible to explicitly specify the render engine on the command line, but if it’s not specified, it goes with the last saved state in the blend file.

Anything that can be specified from within the blend file, I’m not going to bother implementing externally in Loki. I think this makes sense from a user-flow perspective: Working inside Blender is where the user spends most of their time, so it seems natural that they setup everything there the way they want it, save the blend file, and then just use Loki for distributed rendering. Where passing arguments in Loki could make sense would be cases where it can’t be specified from within Blender.

samuraidanieru · November 21, 2014, 2:14am

I haven’t looked into Brenda but it sounds like a very flexible tools that inevitably brings complexity along with the flexibility. This is nothing against Brenda, it’s just the trade-off point that always has to be decided depending on what you want.
I created the Loki project with the mantra of keeping it dead easy and simple to use, and specifically built around Blender. I think it’s for the most part succeeding in that, but like you say, it lacks some flexibility. As it stands right now, there isn’t a way to pass in arguments; it’s mostly counting on everything being defined in the blend file.

I’m trying to understand your work-flow here. So are you rendering with the blender command line so that you can pass your python script before rendering? Is it possible/convenient to pass this python script from within the UI?

I’ll look into the possibility of having a ‘specify python script’ option in the job, but first would like to hear if it’s possible to specify from within Blender.

Hoverkraft · November 21, 2014, 2:49am

See, the point is precisely that I don’t want to open blender to set everything inside, by hand, because that leaves just too much margin for error. I can issue -P draft_settings.py and dependably the resolution is halfed, samples reduced, DOF turned off etc. and then I do -P final_settings.py and it sets everything to highest quality and I can continue to work on the scene and several iterations later I can still use exactly the same settigns, I don’t have to think about what settings the scene was saved with. No errors or oversights because I missed that one switch or had some layer disabled. No multiple versions saved of the same scene for rendering. I can also chain these python snippets to apply some general settings like resolution but then, for a specific scene, override from branched to pathtracing because some shader clears up much faster that way.

I believe that for more elaborate projects this kind of workflow is important. Aside from the CUDA enable example I mentioned another instance where I used python was to cache particles on scene open, the cache was fast to generate but used a lot of disk space to upload so I wanted to generate it server-side. I’ve got multiple scenes, layers, naming conventions, render borders, passes. But if you want to gear Loki towards simpler one-shot renders it is of course alright too, just wanted to give some input. Personally I see the ability to modify the command that gets executed on the nodes as elementary to the function of a render farm.

Edit: Concerning running python from within blender, it is possible, but you have to pass -y switch from the commandline to turn off the trusted source check. And you have to open the file, paste the code, register to auto-run and save the file, but this kind of open-make settings-save tango is what I am trying to avoid in the first place, so it’s no solution for me.

A “specify python” option would already be great, a more advanced method would be:

brenda uses something called a frame template, which looks like this:

blender -b *.blend -F PNG -o $OUTDIR/frame_###### -s $START -e $END -j $STEP -t 0 -a
Somewhere Loki has to formulate the command line that eventually gets executed in a similar fashion. If one could modify this it would be all that’s needed and give the utmost in flexibility.