Serdar Yegulalp
Senior Writer

Flocker bundles Docker containers and data for easy transport

news analysis
Jun 17, 20153 mins

ClusterHQ's Flocker provides a solution for a long-standing Docker problem: keeping containers and their back-end data together while in transit

As Docker containers come into wider use, their shortcomings also become clearer. How, for instance, do you migrate a running container along with its data to another server, and preserve its data in the process? Typically, you don’t.

ClusterHQ, a startup founded in part by core contributors to the Python Twisted network engine, has a proposed solution. Flocker, an open source (Apache) data volume manager for Dockerized applications that’s now in its 1.0 release, allows volumes of data (aka datasets) to be associated with containers and moved with them.

Keeping it all together

Flocker bundles containers and datasets, ensuring they move together whenever a Dockerized application is shuttled between hosts on a given cluster. The one limitation is that storage for the data has to be provided by a shared storage back end accessible to all the nodes in the cluster.

Only a few types of storage back ends, mostly cloud-oriented, are supported right now: Amazon EBS, Rackspace Cloud Block Storage, and EMC ScaleIO. ZFS-based storage is also supported, albeit only via a back end that’s currently experimental.

“Anything you’d use VMware vMotion for,” said Mark Davis, CEO of ClusterHQ, “are the same reasons you might want to move a container around. And if a container has data in it, you need something like Flocker.”

That said, one vaunted feature of vMotion — live migration of running apps — isn’t quite there yet in Flocker. Its migrations are “minimal downtime,” rather than zero downtime, meaning there is a small window of unavailability during the migration process. Luke Marsden, CTO and co-founder of ClusterHQ, stated in a phone call that the downtime “depends on the speed with which the back end can have a volume detached from one VM and attached to another VM. But we’re very interested in minimizing that downtime.”

ClusterHQ already has experimental features in the works to speed up the process by way of volume snapshots, although the back end needs to support snapshots for it to be viable.

Docker’s missing pieces

Docker has traditionally worked with data by way of data volumes, but they come with their own limitations. Manually copying data between containers still isn’t simple (allegedly fixed in Docker 1.7), but the biggest wall remains the poor state of management for data shared by Docker containers running in different locations.

One current proposal for Docker involves making available a new type of storage to containers, where third parties can provide device drivers for their own storage types. If such a feature were implemented, it wouldn’t be difficult for ClusterHQ to rework its support through its dataset back-end plug-in architecture — and keep a step ahead of whatever functionality rolls into Docker’s own core over time.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author