Universities announce virtual clusters project called SnowFlock

analysis
Sep 15, 20083 mins

A new university research project called SnowFlock was announced on the Xen dev list. SnowFlock lets you clone Xen VMs into dozens of identical replicas running in different hosts.

Have you heard about this project? On the XenSource dev mailing list, a new project called SnowFlock was just recently announced. SnowFlock’s binaries and source release are now being made available to the general public under the GNU General Public License (GPL).

Andres Lagar-Cavilla, a member of the SnowFlock project team, said that Snowflock lets you clone Xen VMs into dozens of identical replicas running in different hosts. He added that SnowFlock can do this in less than a second and with very low runtime overhead.

“With SnowFlock you can, for example, perform parallel computations on the fly by scaling ‘instantaneously’ your computing footprint in a shared cluster,” said Lagar-Cavilla. “SnowFlock is a research prototype, hence the 0.1 major-minor.”

An overview of the project states:

SnowFlock is our prototype implementation of the Impromptu Cluster (IC) abstraction. In an IC, an application encapsulated inside a virtual machine (VM) is swiftly forked into multiple copies that execute on different physical hosts, and then disappear when the computation ends. ICs simplify the development of parallel applications and reduces management burden by enabling the instantiation of new stateful computing elements: workers that need no setup time because they have a memory of the application state achieved up to the point of forking. This approach combines the benefits of cluster-based parallelism with those of running inside a VM.

SnowFlock provides swift parallel VM cloning that makes it possible for Internet applications to deliver near-interactive performance for resource-intensive highly-parallelizable tasks. SnowFlock makes use of four key techniques: VM descriptors (condensed VM images that allow for sub-second suspension of a running VM and resumption of a of replicas); a memory-on-demand subsystem that lazily populates the VM’s memory image during runtime; a set of avoidance heuristics that minimize the amount of VM memory state to be fetched on demand; and a multicast distribution system for commodity Ethernet networking hardware that makes the overhead of instantiating multiple VMs similar to that of instantiating a single one.

SnowFlock is the joint work of researchers at two universities: The Department of Computer Science, University of Toronto and Carnegie Mellon University. The team presented the project earlier this year at Xen Summit North America 2008.

If you want to dig deeper into the project, you can download and read a more complete description of the implementation and functionality of SnowFlock in the researcher’s technical report.

You can find out more information and download the release at https://compbio.cs.toronto.edu/snowflock.