Background
My firm is being called on more and more to provide 3D visualizations of our designs. Sometimes it's in the form of a static image. Other clients want an animated walk-through. In the past, we've been able to use our internal equipment to produce this work. But the general-purpose computers we have can take hours to create one frame. Animations consist of thirty such frames each second, which could mean days or weeks of processing for a two-minute video clip.
Luckily, modern design software allows you to harness multiple networked computers and put them to work on a single job. Let's say you have ten machines. One of these will function as a "manager", and split your image into multiple chunks. Or in the case of an animation, it will cut it into individual frames. The manager then sends one chunk at a time to each of the nine other computers. These will render the parts they've been assigned, then return the finished work to the manager, which re-assembles the bits. The manager then sends the next nine parts, and the process continues until the job is done. The more nodes you have, the more chunks you can process at once. The more powerful each machine is, the quicker it can process them.
There are plenty of rendering farms on the internet, and they're relatively easy to use. Just upload your scene, maps, materials, and whatever else to their FTP site, fill out a form, buy some time, and let 'er rip. I've used a service like that (RenderTitan) before, and was generally quite pleased with the results. The price, however, was another matter.
You see, most of these outfits have you pre-pay by the gigahertz-hour. That was a new term for me, though it's not hard to understand. If you have a 1 GHz machine, a gigahertz-hour is what it costs to run that computer for one hour. A 2 GHz machine costs two gigahertz-hours. A quad-core 2.5 GHz machine costs ten gigahertz-hours. Easy, right?
This seems like a decent way to measure processing time, but it makes budgeting somewhat complicated. You can, of course, estimate by running a few test frames on a machine, then multiplying that figure by the total number of frames. For example, if it takes three hours to render ten frames on a dual-core 3 GHz computer, you're left with an average of 1.8 gigahertz-hours per frame. If the total animation consists of one hundred frames, you should budget for about 180 gigahertz-hours. But some frames are more complex than others, and take longer to render. Others won't take as long. That's where things get difficult. As I mentioned in the last paragraph, you have to purchase the time in advance. If you run out of "credits" before the job is complete, it will stop. You'll then have to purchase more time, and submit the remainder of the job. And you certainly don't want to pay for more time than you need!
There are other drawbacks to using an off-the-shelf service. It may not support the plugins or special content you use in your model. If something goes wrong, it's difficult (if not impossible) to see if the problem is machine- or file-specific. As a stereotypical control-freak IT guy, I don't like giving up that control. There had to be a better, cheaper way.
Amazon to the rescue
I had heard about Amazon's Elastic Compute Cloud (EC2) previously, but didn't understand what it was or how I could make it work for us. It's embarrassing how long it took for me to grasp the concept of virtualization. I eventually started playing with XenServer internally, which is really what made cloud computing click for me.
EC2 lets you choose an operating system and computer class in one of Amazon's data centers, creating what they call an "instance". The processing power of a given class is measured in "Compute Units," which Amazon defines as "the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor." Prices range from $0.02 an hour for a t1.micro instance (613 MB RAM, 2 CU) running Linux, to $2.97 an hour for a cc2.8xlarge (60.5 GB RAM, 88 CU) running Windows. There are several classes between these two extremes. Once you've created an instance, you're free to install whatever software you need. You can then create snapshots of your custom configuration, which are saved on Amazon's servers and can be started, stopped, or deleted on demand. You're only billed (at the end of the month) for the time your instances are actually running, plus fees for certain bandwidth use, storage, IP addresses, and monitoring.
You're limited by default to twenty on-demand instances at a time. However, Amazon also offers what they call "spot" instances. You can request up to 100 of these. They make use of otherwise unused capacity, and prices fluctuate according to demand. For example, the current spot price for a t1.micro instance running Linux is $0.006 per hour - 1/3 the price of an on-demand instance. If you bid $0.007, and there are resources available, your instance will launch. This is a tremendous way to save money.
The downside, of course, is that you can be outbid. Let's say you have a spot instance running at the price above, and someone else is willing to pay $0.008 per hour. If there are no other instances available, yours will be shut down in favor of the new highest bidder. So while spot instances are a good way to add capacity cheaply, you don't want to rely on them exclusively.
Another key feature is Amazon's Virtual Private Cloud service. This allows you to create a private 10.0.0.0 network space for your instances, and assign static addresses from that subnet. It's also completely free.
Here we go
Once I understood it, EC2 seemed like the perfect solution. I could pick the hardware I need, install & configure the software we use, and run the farm cheaper than using someone else's.
My first step was creating a base image. I chose an m1.large instance (7.5 GB RAM, 4 CU) running 64-bit Windows Server 2008 R2, in a VPC. To this machine I installed Google Chrome (of course), then downloaded the installers for 3ds Max Design 2011, its Service Pack 2, and Archvision Dashboard. I also changed the time zone to match my own, disabled Windows Firewall, and set a new administrator password. I then took a snapshot for future use.
After capturing the image, I restarted the instance and began configuring it as the manager. Because it's accessed by all the other machines, it needed a static IP address, which the VPC allows. I then installed 3ds Max in trial mode, which gives me thirty days of free use. Next was the service pack, followed by the Archvision software. I created a shared folder for the RPC and other content to live. The 32- and 64-bit versions of the RPC plugins were installed and configured, and I set the Archvision Content Manager to look in the shared folder.
With the manager ready to go, it was time to create an image for the nodes. I went back to my base image (the one with 3ds downloaded but not installed) and created a c1.xlarge instance (7.5 GB RAM, 20 CU) in my VPC. I mapped a drive to the shared folder on the manager, installed the software, and pointed the RPC plugins to the static-addressed Content Manager.
Now, my goal with the nodes was to make them completely hands-off. I don't want to have to log on to twenty or more computers before they'll start rendering. Backburner, Autodesk's network rendering component, can run in the background as a service, so I installed it to run using the Administrator account. I then fired up the Manager component. The node registered itself with the manager, and I sent through a quick job. Success!
However, things got a little more complicated when I launched a second instance. When Backburner on the new machine registered with the Manager, it seemed to replace the existing node. After some investigation, I found a file (backburner.xml) that contains the settings used to identify the machine to the manager. Because I'm using a single image, the files on all the instances start out identical. I had to find a way to automatically change the file on each instance to reflect that machine's MAC address and name.
My solution was to write a PowerShell script that runs at log on. It stops the Backburner service, grabs the computer name and MAC address, and polls a web address for the instance's ID. It then changes those settings in the backburner.xml file and restarts the service. The script went in the Startup folder, and I set Windows to log on to the Administrator account at boot time.
Wrap it up
Launching nodes could not be more hands-off. All I do is select the image and launch as many instances as I need. They automatically log on to the Administrator account, change the necessary Backburner settings, and launch the service. On the manager, I can see new nodes come online and identify themselves. Once they do, each grabs a frame and goes to work.
The manager, on the other hand, needs to be created each time. "Why," you ask, "would that be?" Well, remember that I'm using the trial version of 3ds Max Design. That means I get thirty days' use before I have to purchase the software. By starting from scratch each time I need a manager, I gain another month of "free" rendering. It's a bit of a hassle to re-install everything, but you can't beat the price.
Speaking of price, it compares quite favorably. Remember RenderTitan? They charge $0.125 per gigahertz-hour. In EC2, I'm using hardware that Amazon rates equivalent to 20 GHz, and costs $1.16 per hour, for a gigahertz-hour calculation of $0.058. That's half the price of the commercial service. Not too shabby!
The downside is that if I only use on-demand instances, I'm limited to nineteen rendering nodes, plus a manager. But I can get additional nodes, when they're available, using spot instances. The base spot price for a c1.xlarge instance is $0.45. By bidding closer to the on-demand price, I can maximize my chances of getting that extra horsepower. I've yet to reach thirty running instances this way, but it could happen over the holiday weekend.
Cool article, we currently render on EC2 with 3ds using 100+ instances. Works well, crashes from time to time but overall pretty amazing!
ReplyDeleteDid you have any difficulty getting Amazon to boost your instance limit? Was it raised permanently? I never heard back from them about my request.
ReplyDeleteThanks for posting the above article. I am having the exact same issues with the backburner.xml file. Care to share your powershell script? I am most of the way with a vbs but the mac address is causing me grief.
ReplyDeleteIt took me quite a while to figure out a way to extract the MAC address, too. I just posted the script and XML file here: http://tech-goodness.blogspot.com/2012/09/scripting-backburnerxml-file-for-amazon.html.
DeleteI am attempting this right now but having a really hard time with it. As someone who really ever touches anything related to IT, I find it very difficult to set up the connection between the Instance and my office network. I wish there was a step by step guide that explains everything in lemons terms.
ReplyDelete