linuxwebcluster.com

EC2 render speed comparison

Just an update on how the speed tests are going with the Amazon EC2 render farm. There does not seem to be any speed up by going 64 bit for my test renders. Possibly this is due to the particulars of the job I am running... But the "extra large" 64 bit EC2 instance I ran that had 4 dual cores and 8 GB of memory finished the render exactly 4 times faster than the 32 bit instance with 1 dual core. 

So the price and time to completion was identical either way. Now I am moving on to testing the EC2 cloud versus my own render nodes, mano a mano. And the results are in!

Several tests showed the same result, so I will not bore everyone with every last detail. The bottom line is that local renders were performed on a DL380 with 2 GB of memory and dual Xeon 2.8 Ghz CPU's. They are hyperthreaded so they show up in Linux as 4 CPU's. On average, frames were rendered fastest when processed 4 at a time and they finished in about 40 seconds per frame. 

In EC2 I used a single "high-cpu medium" instance for comparison. It claims to provide "virtual" dual Xeon 2.3 Ghz CPU's and 1.7 GB of memory.  It rendered the same images in about 65 seconds each, on average. So the remote renders were about 1.6 times slower than local, about 90% of which I attribute to CPU power. To estimate actual CPU power based on the experimental results, calling it a virtual dual Xeon 2.0 Ghz would be more accurate. Of course, the wonderful thing about EC2 is that I can provision as many of them as I want! The only thing you really need to know is how to baseline them... And now you have something for comparison. At 20 cents per hour per instance, it's a bargain if you want to burst your render farm capacity with Amazon's EC2. 

 

Amazon EC2 Cloud Render Farm Integration

On request from a prospective customer, I looked into the idea of integrating RenderFarmer with cloud computing technology. Of course we are already compatible with a PXE booted VMWare virtual machine image, but cloud computing is all about external resources that you can offload work to. Amazon EC2 is the leader in the "cloud for rent" category, so I did some homework to figure out how this might work. The idea is simple: If you want on demand capacity in your render farm, just boot up some cloud based "virtual render nodes" and they'll join your farm and go to work alongside whatever local nodes you might have. 

At first, the integration does not look straightforward. In RenderFarmer, everything gets tied in to a cluster shared root filesystem by PXE booting into an image that goes out over the local network and links to whatever it needs.  The nodes are diskless. But the latency between the cloud and us is far too great for that kind of live network interaction to work in that case. So no PXE. However, the whole point of PXE / shared root is that it scales fast. You can add nodes in a few minutes with no sysadmin skills or effort. Likewise, Amazon's EC2 scales fast. They have their own methods for doing this, so let's take advantage of them and not worry about PXE!

A "server" in EC2 is stored as an image, called an AMI (Amazon Machine Image). There are lots of basic images publicly available. You can choose one and then launch as many instances of it as you like. So our first challenge was to build our own AMI that had whatever cluster magic we would have gotten via PXE boot. After a while, a 32 bit Fedora Core 8 render node emerged that had all the required ingredients. 

Now the basic AMI instance was running but local rendering was incredibly slow. Turns out the caching does not work well enough; all the render engine files had to be local to the instance. I rebuilt the AMI to include the Maya render engine and tested again. OK, rendering is much better now!  

But we still need some level of tighter integration with the render farm here in the office. For instance,  the output directory for the images would ideally be the same for all nodes, whether local or remote. And there were some other remote directories that the DrQueue render queue manager needed in order for jobs to propagate information... There are several ways to do this. We could take the easy way and use NFS or even rsync at certain intervals, but this would take away a big advantage of RenderFarmer. By using the cluster filesystem, we retain the ability to add more servers later for read and write striping and fault tolerance.  This gives us the power to increase our i/o to almost any level that we need, and to do so transparently. By adding servers in the back end, "underneath" the clients, they all get the i/o benefits witout any configuration changes. Trying to add this on later to an NFS / rsync system would be painful or impossible. The cluster filesystem is more difficult to use, but it's worth it.

After many hours of script fu, we had a cloud based render node that processed jobs in real time. The question now is how fast are the different types of instances you can deploy from the cloud... There are three types of instances we are interested in. The first is a "small" that has one CPU. Amazon says it is equivalent to a 1 Ghz Xeon or Opteron and it costs 10 cents (.1 USD) per hour to run. The second is a "medium" instance that has dual Xeons and costs 20 cents per hour. 

The first test was to render 100 frames of GiantStormAnim.mb, a sample scene that comes with Maya. I clicked two or three buttons and deployed 10 "small" nodes, and started the job when they were done booting. 41 minutes and 20 seconds later, it was done! Next I deployed 5 dual cpu "medium" instances and ran it again. It was 30% faster, completing in just 28 minutes and 25 seconds. Good to know that the second setup has more CPU horsepower, despite having the same number of cores and the same $1 per hour cost (10 cents * 10 nodes or 20 cents * 5 nodes).

Next up is the Extra Large High CPU instance, which has 8 dual Xeons for a whopping 16 cores per instance. But I can't try it until I create a 64 bit OS image... That's ok, we came a long way so far and I'm very happy with the results. In effect, it is possible to deploy a temporary virtual render farm OF ANY SIZE that can be integrated 100% with RenderFarmer, just as if the nodes were local. The nodes can be all virtual, or the virtual nodes can just supplement any existing local nodes. From a user perspective (when launching or checking on jobs, say) one cannot tell that some nodes are local and some aren't. And when you aren't using your virtual nodes, you aren't paying for them! Also, just like with local PXE booting nodes, it only takes moments to add nodes on the fly and have them join into running jobs.

We are ready to help our customers get their own AMI images ready and working, and anyone interested in beta testing this exciting new technology should contact us as soon as possible.  Any customers with a support agreement will receive free assistance and integration! Of course, if you are not yet a customer but have questions about this technology, we are happy to assist you in determining if Amazon EC2 cloud integration will work for you. 

 

Blogs and linux render farm gurus

As I was surfing around doing my competitive homework, I came across a nice write up on how to build a serviceable render farm out of Butterfly Net Render and Linux. Now typically I would not spend my time discussing the competition, but in this case it deserves a special mention!

The blog post is here, and my thanks to Lonnie for putting in the effort. Without the sharing of knowledge we Linux folk would still be staring at blue screens of death on a regular basis. In the same spirit, my thanks to Barry in his post on the same topic. He used Lonnie's write up as a guide and put his own configuration steps online for aspiring render farmers. Thanks, guys - you are good ambassadors of open source.

Anyway, I enjoyed the post because it illustrates pretty well the entire point of RapidScale Clusters (that's us). Building a linux render farm can be SIMPLE. It can be EASY. It can happen without teams of linux gurus and snippets from seventeen different howtos. That's the whole idea behind RenderFarmer. Fast, shared root, PXE booting diskless node Linux render farms are the way to go when ease of use and scalability are key requirements. There are GUI's  for everyday management tasks like submitting jobs and adding and removing nodes. I daresay it's really, really easy. No, it is not free. But it is very affordable and if rendering is a serious issue for you or your organization, it would be pretty difficult to imagine that it's not worth the small price for a subscription to try it out. Or for that matter, to download a 90 day trial and just put the thing to work. If it doesn't save you twice the purchase price in time and effort, don't buy it!

But bringing a node from bare metal (that is, no OS, no render software, nothing) to rendering in 120 seconds is going to save you a ton of time in node configuration and maintenance. Just imagine never having to install anything on any of your nodes... That's what we do here. That's why when I read posts about how a linux guru tweaked and prodded a BNR render farm into existence on linux, against its will and better judgement, I feel compelled to hit the blogosphere myself and evangelize: It doesn't have to be so hard! Linux really IS the best platform for render farms. And some companies knew that from the beginning!

 

 

RenderFarmer IP address configuration

A customer installed RenderFarmer on their server and tried to use eth1 as the interface to communicate with render nodes... But only the use of eth0 is supported. So how do we change it? I'll post the support ticket here for anyone with a similar issue:

Customer:    Hello, I did a vanilla CentOS 5.2 install, installed all updates, and ran the RenderFarmer installer. eth0 receives a IP from [our University's]  DHCP servers. After running your installer, the virtual interface eth0:0 was created with 10.1.1.1. However, our render farm nodes are connected via a switch and cable connected to eth1. We took a look at our logs and the render farm master is serving DHCP addresses to the University network on eth0. What we need to be able to do is get 10.1.1.1 over to eth1 so the render farm nodes are isolated and we're no longer attempting to assign DHCP addresses to machines on the University side of the network (eth0).

Support:   Basically we just need to switch the IP's. The cluster will use eth0, period. So if you switch cables and make eth1 DHCP from University network, and make eth0 10.1.1.1 you will be all set. You should then edit /cluster/mon/mon_common and /cluster/bin/lvs-init and set a new virtual IP there... Something on the 10.1.1.x range. Doesn't matter what since you guys may never use that functionality anyway. Just remember that if you assign 10.1.1.250, say, that you don't also assign that to a render node.

You will also have to do 'service ipvsadm stop' first... That will get rid of eth0:0 until next boot or until the service is manually restarted. Until the IP is down, you won't be able to assign it to another interface. Also, the settings for DrQueue will need to be updated. If you look in /etc/profile you'll see where:
DRQUEUE_MASTER=OLD_IP and and again in /etc/init.d/drq-slave, which will have a similar line.

Other places the IP will need to be changed....  

/cluster/cluster_nodes
/etc/glusterfs/gluster-server.vol (change the IP range to 10.1.1.x)
/etc/glusterfs/gluster-client.vol
/etc/hosts
/etc/dhcpd.conf

 

samba server share

As a follow up to the smbclient quick and dirty post... Here's smb server "quick and dirty". Sometimes I have some files I download on my Windows laptop, maybe a tarball or rpm or something like that. Instead of going through the hassle of rsync over cygwin or some other convoluted way to get files from Windows to Linux, the easy way is to turn on samba on the Linux box.

First make a directory to hold the files:
mkdir /usr/src/files

Then open up the permissions so a windows "guest" can write to this directory through samba:
chown nobody /usr/src/files
chmod 777 /usr/src/files

Usually samba is installed by default, but if not, use your package manager like so (on RH / Centos):
yum install samba

All that's left now is the configuration file, found at /etc/samba/smb.conf. Of the many, many possible options, very few are required to get a simple file share. Following is a minimal smb.conf that will share the directory we created earlier:
#======================= Global Settings =====================================
[global]
        security = share
#============================ Share Definitions ==============================

[fileshare]
        comment = simple network file share
        path = /usr/src/files
        public = yes
        read only = no

Et voila  (or "viola" as so many bloggers seem fond of...    ;-)     'service  smb restart' and you can now access the share from your windows machine by clicking Start, Run,  \\server-ip\fileshare.
 


Page 2 of 4

Subscribe to the Linux Admins Blog and get new posts delivered by email!
Enter your email address:

Delivered by FeedBurner

Linux News


Tell the developers:

The type of clustering you are most likely to deploy is:
 
What Linux distro do you use for clusters?
 

Copyright 2010    RapidScale Clusters, LLC