matt_prigge
Contributing Editor

Fake a stretched cluster for fun and profit

analysis
Apr 30, 20127 mins

True stretched clusters are out of reach for many, but you can use the concepts to avoid downtime -- even amid major upgrades

Last week, I wrote about the huge benefits and associated challenges of implementing “stretched” virtualization clusters. These offer tangible benefits over traditional hot/warm site implementations in that they allow workloads to be seamlessly migrated from one site to the other. In some cases, they even allow site fail-over to take place automatically.

But the expense of clustered storage — not to mention the complexity involved — put true stretched clusters out of reach for many organizations that could benefit from them. In spite of that, even if you don’t have the right kind of back-end storage to implement a true stretched cluster, you may still be able to reap some of the flexibility benefits they bring to the table and ease the heavy lifting of complex upgrades and stringent uptime requirements.

Recently, I found myself faced with the prospect of upgrading a sizable dual-site virtualization infrastructure. Though each upgrade taken individually wasn’t that complicated, a strict zero-downtime requirement meant that careful planning would be required. In the end, the solution leveraged many of the same concepts present in stretched clusters, illustrating that you can enjoy some of the benefits without having all of the pieces in place.

The challenge The pre-upgrade infrastructure consisted of two fiber-attached sites each with their own synchronously replicated Fibre Channel-based storage arrays and blade-based virtualization clusters. During the course of the upgrade, new virtualization blades would be implemented at the production site, the older production virtualization blades would be migrated to the recovery site, and the recovery site virtualization blades would be retired. Along the way, the hypervisor — VMware vSphere in this case — would also undergo a major version upgrade, along with all of its dependent components: vCenter, Site Recovery Manager, View, and so on.

At the same time, a substantial data center networking upgrade was being undertaken that would see 10Gbps Ethernet supplant multiple 1Gbps Ethernet links used for top-of-rack to end-of-rack and for cross-site connectivity. Included in the course of this upgrade, the network interconnects in both blade chassis would be upgraded to 10Gbps-capable interconnects — necessitating a substantial reconfiguration of the blades themselves.

All of this had to be done with absolutely no production impact on the virtualization environment.

This is a tall order, to be sure, but one that was made possible by the availability of high-bandwidth intersite connectivity and hypervisor features such as vMotion and Storage vMotion.

The solution Though there are a number of ways that this work could have been accomplished under slightly different circumstances, the simultaneous need to implement new blade hardware, retain the existing blade chassis, and substantially change the configuration of the blade chassis interconnects led to only one real course of action — one that would see the entire production virtualization implementation migrated to the recovery site and back again.

In general, I think blade chassis are just about the greatest thing to ever happen to server hardware. They make it extremely easy to manage a large number of servers, are much more power/HVAC efficient, and fit a lot of computing punch into a relatively small package. However, they have a tendency to be difficult to upgrade in place — especially when large changes need to be made to the network and storage interconnects.

In this case, each of the blade chassis were equipped with four gigabit Ethernet switches and two Fibre Channel switches — matching each blade’s four gigabit Ethernet ports and dual-port Fibre Channel HBA. The upgrade would see all six interconnects replaced with a pair of interconnects that would leverage the blades’ built-in 10Gbps CNAs, which in turn would be utilized for both network and storage connectivity. That kind of change can’t be made without substantially disrupting all of the blades in the chassis, so it was necessary to empty the chassis entirely before upgrading the interconnects.

Given the substantial amount of blade reconfiguration necessary, the fact that the hypervisor needed to be upgraded and the existing production virtualization hosts had to eventually end up in the recovery data center, the easiest thing to do was to move those blades to the recovery data center — upgrading them and the recovery-side blade chassis along the way.

After the VMware management components (namely vCenter, but also other applications that depended upon it) were upgraded, blades were removed from the production environment one at a time, physically reconfigured, installed in the newly reconfigured recovery site blade chassis, and upgraded to the newest version of the hypervisor. At that point, a commensurate chunk of the virtualization infrastructure was migrated (via intersite vMotion) to the recovery site — essentially stretching the cluster between two sites. This process was iteratively repeated for each production virtualization blade until they had all been migrated, along with their workloads, to the recovery site.

At this point, the entire production environment was running at the recovery site while the primary storage it was accessing was running at the production site. Given that the infrastructure had been designed with sufficient intersite bandwidth to make this possible (just a pair of 8Gbps FC links in this case) and the implementation was relatively temporary, no effort was undertaken to actually migrate the storage to the recovery-site primary storage. However, it’s important to note that, if it was desired, Storage vMotion could easily have been used to transition the back-end storage from the primary-site SAN to the recovery-site SAN — effectively “walking” a VM from one site to another: first through vMotion to move the compute load, then through Storage vMotion for the storage load.

After the production-site blade chassis was upgraded with new interconnects, the new virtualization blades were installed and the process was reversed, except without the movement of physical hardware this time. All the VMs were migrated back to the production site virtualization hosts via vMotion, and the older recovery-site virtualization hosts were removed from the cluster one by one until only the new blades remained. Following that, the older recovery site blades were used to re-form the recovery-site cluster that had existed previously and Site Recovery Manager was reconfigured.

In retrospect In a relatively short amount of time, literally every piece of networking, interconnect, and blade hardware was upgraded and physically reconfigured; workloads were migrated from one site to another; and the data center network infrastructure was upgraded — without impacting access to a single VM for more than a second or two at a time. Pretty cool.

While the cluster was in fact stretched across two sites for a short period of time, it’s important to realize this doesn’t constitute a true “stretched cluster.” If, at any point during the migration work, the production site primary storage or the intersite connectivity had failed for some reason, a relatively quick, but still very manual storage fail-over to the recovery site primary storage platform would have been necessary. This is the real difference between a “true” stretched cluster operating on top of clustered, geographically diverse storage and a cluster that just happens to exist at two sites, but is still dependent on storage at a single site.

It’s also worth noting that since the recovery site blade chassis had to be reconfigured to accept the new interconnects, there was a short period of time where the organization’s site fail-over SLA wouldn’t have been met at all. This was deemed an acceptable risk due to the amount of engineering resources available at a moment’s notice should the worst have happened, but isn’t a risk to be undertaken lightly.

Though this migration didn’t result in the creation of a true stretched cluster, it did leverage many of the same capabilities that a stretched cluster can grant and is a great illustration of why I’m so excited to see stretched cluster tech become commonplace. What live virtual machine migration and subsequent live storage migration did for individual clusters of virtual machines and their hosts, stretched clusters will do for entire multisite data center infrastructures.

This article, “Fake a stretched cluster for fun and profit,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.