So-called stretched clusters promise utopia of disaster avoidance and disaster recovery virtualization, but the technology is still new. Here's how to deal with the challenges Because virtualization offers so much cost savings and agility, many organizations are going the whole hog and virtualizing mission-critical systems they wouldn’t have dreamed of virtualizing a little while ago. Who would have thought, for example, that we’d see enterprises deploy Oracle virtually rather than on physical hardware? It’s no surprise there’s increasing interest in wringing every last drop of disaster recovery and disaster avoidance out of virtualization infrastructures. Most commonly, this is manifest in a replicated, dual-site architecture where virtualization resources can be quickly failed over to a secondary data center in the event that catastrophic failure strikes the primary site. For some organizations, however, this isn’t enough. They want the capability to seamlessly migrate workloads from one site to another in order to utilize computing resources at both sites. Moreover, they want the infrastructure to heal itself automatically in the event of a site or hardware failure — just as single-site virtualization clusters are able to do. Hence the new buzz phrase “stretched cluster.” The stretched cluster has its challenges. But advances in the capabilities of virtualization stacks and the underlying storage gear have made so-called stretched clusters an increasingly attainable and attractive option. The challenges There are three main challenges to implementing a stretched cluster: You must have serious connectivity between data centers. You need to be able to support live migration of virtual machines from one site to another (using VMware vMotion, for example). You need synchronously mirrored storage at both sites, incurring bandwidth and latency requirements that may be difficult to deliver. Typically, you’ll see stretched cluster implementations built on multiple gigabit-class metro Ethernet deployments or dark fiber pushing 10Gbps Ethernet and/or 8Gbps Fibre Channel. Whatever type of connectivity is deployed, it needs to be able to deliver no higher than 10-ms latency for VM migration and 5-ms latency for storage replication, but you’ll probably want to shoot for much less. Right off the bat, some organizations will simply lack the connectivity they need to implement this kind of design. Nonetheless, even in the sticks of northern New England, I’ve seen relatively cheap, fiber-based WAN alternatives appearing in places I would never have expected only a couple of years ago. Provided you have the connectivity, the second requirement is a storage platform that can support a stretched cluster implementation. While synchronous site-to-site replication has been around for a long time, most synchronous replication implementations are active/passive configurations. This means one side is specifically designated as production, while the other is failover only. In such deployments, manual intervention is usually required to “promote” the inactive failover array to become active and offer up the storage at the recovery site. That’s great for a hot/warm site configuration, but doesn’t fulfill the goal of reacting to a failover automatically. Only a few storage platforms allow for both sides of a synchronously replicated storage infrastructure to be active at the same time (or, more accurately, to appear as a single storage infrastructure). EMC’s Vplex, HP’s LeftHand P4000, and NetApp MetroCluster are a few good examples, though there are others. It’s very difficult to gracefully handle the large number of failure scenarios that could occur without opening the storage environment to the real danger inherent in split-brain scenarios, where the two sites lose their synchronization. Herein lies the third and largest challenge of implementing a stretched cluster, one that virtualization vendors and storage vendors are still grappling with: the wide range of split-brain and suboptimal workload placement scenarios that can result when you try to engineer geographically diverse, active-active storage infrastructures. Imagine you’ve constructed a stretched virtualization cluster consisting of six virtualization hosts and a synchronously replicated storage infrastructure at each of two sites. The sites are attached via two low-latency fiber links: one to support inter-VM traffic and virtual machine migration and the other to provide the bandwidth for the two storage infrastructures to replicate. In that scenario, the real failure of an entire site could be handled fairly gracefully. The remaining site’s storage and virtualization infrastructure could easily tell that the failed site isn’t responding and automatically restart the virtual machines lost in the failure, maybe resulting in only a few minutes of downtime. It sounds great, but it’s not that simple. For example, what happens if the first site didn’t fail and one or both of the intersite links fail instead? If the intersite storage replication link is lost, the two storage arrays could both assume that the other has failed and become active at the same time — a true nightmare scenario in which the two replicas start to diverge. Likewise, failure of the intersite link used for virtual machine traffic might result in the two halves of the virtualization cluster assuming the other half is down and attempting to restart VMs the other site is still actively running. Some storage vendors have attempted to deal with this by requiring the implementation of a software stack that runs at a third site to aid the two active storage systems in determining whether the other site has truly failed or merely lost connectivity. Even with those measures, however, some scenarios persist in which site partitioning can occur. To prevent the split-brain nightmare from playing out, one side of the storage cluster is generally defined as being primary in the event of a loss of site-to-site communication — either on a per-array or per-volume basis. While this does mean there are scenarios in which the storage infrastructure will not recover automatically, it is necessary to avoid the corruption of data that could result if such measures were not taken. These split-brain scenarios are the worst challenge for virtualization and storage vendors alike, but they’re not the only ones. Though the two strorage infrastructures will appear as a unified array to hosts un some stretched cluster storage implementations, only one storage array is responsible for accepting I/O for a given volume. This means that I/O generated by a virtual machine running on a host at the other site must first cross to the first site to be written; only then can it be synchronously replicated back to the first. The trouble here is that hosts (virtualization hypervisors in this case) don’t have a good way of knowing whether the storage they have visibility to is actually local to them. VMware and others are working diligently to fix this by introducing APIs that allow the underlying storage to give the virtualization infrastructure the information it needs to avoid these kinds of suboptimal virtual machine placement scenarios. However, there is still work to be done — a theme that persists throughout stretched cluster deployments. Looking to the future While many organizations are looking to stretched cluster technology to give them the flexibility to dynamically balance workloads across multiple sites while simultaneously being able to recover from a catastrophic site failure, the underlying technology is still new and, in some cases, incomplete. That said, tremendous strides are being made. I have no doubt that within the next two to three years, we’ll see advances that result in the stretched cluster being the de facto way of implementing multisite failover. This article, “Avoid disaster with stretched clusters,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter. Technology IndustryData ManagementSoftware Development