paul_venezia
Senior Contributing Editor

Virtualization roulette: One 10G switch is never enough

analysis
May 28, 20136 mins

The old joke 'If you can't afford two Porsches, you can't afford one' might apply to 10G switches, but there's a poor man's alternative

Last week I hit on the theme of just because you can do something, it doesn’t mean that you should. That was in the context of “saving” time and money by shoehorning a bunch of VMs onto two physical hosts that can technically handle the load when it’s a far better idea to spread things out and run three or four physical hosts instead. Feel free to adjust those numbers up to scale — it’s basically the same concept on an eight-host cluster that should actually be 12.

Other aspects of consolidation and conservation come into play with virtualization clusters, and switching is certainly one of them. With the push to 10G well under way, it’s enticing to string up a cluster of hosts on a 10G switch and reap the rewards of faster migrations, faster IP storage, and faster interaction between VMs on separate hosts. However, I see a fair amount of single-switch deployments of 10G in production. I’m not so sure that’s a great idea.

[ Also on InfoWorld: Virtualization showdown: Microsoft Hyper-V 2012 vs. VMware vSphere 5.1 | Get expert networking how-to advice from InfoWorld’s Networking Deep Dive PDF special report | Get the latest practical info and news with InfoWorld’s Data Center newsletter. ]

10G is still expensive, certainly moreso than aggregated 1G connections, but that shouldn’t result in a scenario where the loss of a 10G switch spells doom for the virtualized infrastructure. While that may seem obvious, it happens more than you might expect.

For the larger shops, bite the bullet and buy redundant switches, configured as separate entities or as a stack. Depending on the vendor and its software support, a stack may be less redundant than two separate switches because a software oops in one switch can affect the switching operation of the other in certain circumstances. In a server-centric 10G deployment, there’s little to be gained in a stack anyway, unless you need aggregated 10G pipes from your hosts.

A better solution is to run an active/passive failover link configuration with the two legs run to different switches. Ideally the active links of each host would be balanced across the two switches so that the overall switching load is shared. With bonded 10G uplinks, or 40G between the switches, you should be good to go, able to survive the loss of any switch at any time. If you need 20G bonded from each host (or a pair of 20G links across four 10G interfaces on the host), a stack may make more sense or a modular switch with redundant everything. This is not the place to cut corners.

However, some deployments may have a hard time justifying multiple 10G switches for a relatively low overall port count. If all I need are 16 10G switchports, buying two 24-port 10G switches seems like serious overkill to a nontechie looking over the budget. It’s a fact of life that important items get cut from projects based on raw numbers like these, lacking context or understanding. Sometimes all we can do is figure out a way to live with the result until the issue can be properly addressed later on or in a new budget cycle.

Making do with a single switch

One way to do this is to leverage a best-of-a-bad-situation scenario with a single 10G switch. A small-to-medium data center switching infrastructure may have few or no 10G ports at all, and really, the only 10G necessary is for the virtualization hosts. This leads many shops to pick up 24-port 10G switches to run just those links.

If you do this, it’s a very wise idea to spec your hosts with not only four 10G ports, but at least two 1G copper ports and ideally four 1G copper ports. It’s unlikely that you’ll need more than 10G per front-end and storage network, but you want those links to be redundant. You can set up the active/passive failover across both 10G ports linked to the storage network, say, or create an aggregate, though you probably won’t get much use out of it. All of these links will terminate at a single 10G switch. This means that you are only protected against isolated problems like a 10G port frying on the switch or the host, or a cable going bad or coming unplugged. The redundancy will not be of any use if the switch hiccups.

Use protection

While you want to protect against those problems, you must also protect against a switch failure, though with only the one 10G switch, that becomes a challenge. This is where those 1G links come into play. Though there’s no need from a bandwidth or performance standpoint to use 1G links for production traffic on a virtualization host with 10G interfaces, you can configure most hypervisors to use 1G interfaces as standby interfaces. These should be connected to the main data center network directly, or through a dedicated 1G switch. If possible, they should be bonded to aggregate both 1G paths together.

When configured properly, all traffic during normal operation will be routed through the 10G links and the 10G switch, but a switch failure there will cause the traffic to be re-routed through the 1G links that are otherwise dormant. Naturally, it’s imperative that the storage be configured to allow connections across its own set of links to that network, because we need to maintain the storage paths in the event of that failure.

Should that one 10G switch fail, the performance of the cluster will necessarily drop substantially, but the bits will still pass, and the virtualized infrastructure will still be accessible. This is the goal, and it shouldn’t add much to the budget, especially if existing 1G switching can be utilized.

This may seem basic, but there are many who think that the raw bandwidth provided by 10G switching obviates the need for 1G connections. In a fully redundant 10G implementation they may be right, although I would still tend to err on the side of caution and configure backup management networks on a separate 1G network spanning all cluster members.

Switching failures are generally rare, and I have my share of switches with five-plus years of uptime. However, that’s never a guarantee. When we’re putting all of our eggs in the virtualization basket, we really do need to ensure that the basket is as strong as we can make it.

This story, “Virtualization roulette: One 10G switch is never enough,” was originally published at InfoWorld.com. Read more of Paul Venezia’s The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.