matt_prigge
Contributing Editor

Build a bulletproof network for IP storage

analysis
May 7, 201210 mins

Correctly configuring a network for IP storage isn't quite as simple as it seems. Here's how to do it right

Virtualization continues to make huge inroads, thanks to the obvious flexibility and reliability benefits. And getting the most out of virtualization almost always requires some kind of shared storage. Otherwise, features such as live virtual machine migration and automated virtualization host failure recovery simply aren’t available.

In many businesses, particularly large and medium-sized ones, most shared-storage implementations end up being IP storage — sometimes NFS, but generally iSCSI. IP-based storage is an excellent fit because it employs the same networking hardware and concepts that network admins are already familiar with. And it’s easy to get up and running.

But just because it’s easy to fire up doesn’t mean it’s easy to do it right. Though less expensive than Fibre Channel, IP-based storage can actually be more complicated to configure optimally than FC storage. It’s not as simple as punching in a few IP addresses. Constructing a reliable, high-performance network to support an IP storage infrastructure requires careful attention to a variety of different factors.

Building a bulletproof network

Though a virtualization infrastructure attached to IP-based storage rides over the same kind of networking hardware you might already have in your server room, it relies on that hardware to a much greater extent than stand-alone physical servers do.

For example, if you encounter a run-of-the-mill networking failure in a physical server environment — say, a disconnected cable or even a failed switch — your users will be disconnected from their applications, but the server itself isn’t likely to be significantly disrupted. Once you’ve tracked down the loose cable or thrown a spare switch into the rack, everything should come back to life without much trouble.

But when a virtualization infrastructure loses access to shared storage, things go south in a hurry. The servers lose access to their own storage, almost as if you had simultaneously yanked all the hard drives out of a physical server. While this sort of network event would almost never result in actual data loss, time and effort would be required to get everything reconnected and start up all the virtual machines again.

Because of this, most IP storage implementations use redundant storage switching and fully redundant cabling, so even the complete failure of a switch won’t result in storage disruption. Though some designs will see those switches dedicated solely to storage traffic for simplicity of implementation, many organizations will also use them to offer redundancy to the rest of the network — dual-attaching all desktop aggregation switches, virtualization hosts, and storage to both “core” switches. Given that fairly capable managed gigabit Ethernet switches are available cheaply, grabbing two instead of one isn’t a huge cost concern, given the peace of mind it can bring.

Redundant switching, however, involves a number of network architecture concepts that many small-business network admins may not have encountered before.

VLANs

When implementing IP storage, it’s important to separate the storage devices and interfaces that communicate with them from the rest of the network. This prevents problems that might impact the rest of the network (broadcast storms, for instance) from impacting server access to shared storage. It also prevents the network at large from “seeing” the storage — adding an extra layer of security. This can be accomplished by dedicating completely isolated switches for storage and simply not attaching them to the rest of the network. Or you can use VLANs (Virtual Local Area Network) to accomplish the same effect.

Using the same switches for both “front end” traffic (virtual machines, other switches, desktops, and so on) and “back end” traffic (storage) allows you to spend fewer dollars on switching resources and gain redundancy benefits for the network as a whole. Given that most fixed-configuration switches perform all of their switching and forwarding duties in hardware and are capable of wire-rate throughput on all of their ports, you generally won’t face any danger in overworking a switch by running both kinds of traffic at the same time. However, it does require the added complexity of deploying VLANs.

In a basic sense, VLANs are a means to create multiple virtual switches within a single piece of switching hardware. A switch that’s capable of implementing VLANs (just about any managed switch) will ship with all ports configured to be in the default VLAN (VLAN 1). After you create a new VLAN on the switch and attach a collection of switch ports to that VLAN, they will function as if completely isolated from the other ports — as if devices plugged into those ports are actually on a different switch.

Let’s say you have a pair of 24-port managed gigabit switches to which you’d like to attach an IP SAN, three virtualization hosts, and a collection of existing switches and other network devices. You might leave ports 1 to 16 in the default VLAN1, then configure ports 17-22 to be in VLAN2. The SAN’s storage interfaces and a pair of interfaces from each virtualization host would be evenly split across the VLAN2 ports in both switches while everything else would be split over the ports in VLAN1 — essentially giving you the same performance and security you’d have if you bought four switches instead of two.

Cross-connections

Next comes an important piece of the puzzle: configuring the cross-connection between the two switches. Since you’ll be splitting all your devices across the two switches, you need to provide them with a means for devices on one switch to talk to devices on the other switch. That’s usually as simple as attaching the two switches to each other, VLANs complicate things a bit. If you simply run a cable from port 24 on the first switch to port 24 on the second, your VLAN1 devices will be able to talk to each other across that link, but the VLAN2 devices won’t. To allow them to talk to each other, you need to configure that port on each switch to be VLAN-tagged for the VLANs you want to communicate with each other.

VLAN tagging, sometimes called “trunking,” utilizes the 802.1q standard to allow traffic from multiple VLANs to pass over a single physical link. It does this by inserting a four-byte VLAN tag into each packet that’s not part of the default VLAN as it passes from one switch to the other. The second switch recognizes that tag, strips it off, and sends the traffic into the VLAN indicated by the tag. This tech prevents you from needing to dedicate a link for each VLAN you want to exist on each switch.

Another best practice in these kinds of implementations is to ensure you have sufficient bandwidth between the two switches; this allows a virtualization host interface attached to the first switch to speak to a SAN interface attached to the second without running into a bottleneck, for example. Most IP SAN vendors recommend that you provide at least as many cross-switch links as you have interfaces in your SAN.

For example, if you have a SAN equipped with two active gigabit Ethernet interfaces, you’d want at least two gigabit ports attaching the switches to each other. This is done by creating a team out of two ports (also variably called a “trunk” or “port channel” by different networking vendors). When teamed together, all traffic between the two switches will be divided up between the two links — allowing roughly twice the cross-switch throughput as a single link.

Spanning tree protocol

Now that you’re operating a pair of core switches for your network and have your virtualization and storage resources evenly split across them, you’ll have the redundancy you’ll need to survive a switch failure and still have the entire virtualization environment keep ticking. However, that leaves out the rest of the network. If you provide only a single connection from each workstation switch to the core switches, that switch will be isolated if the attached core switch fails.

You can avoid that eventuality by having each workstation switch attach to both of your new core switches. However, doing so is not as easy as just running a second connection to the second core switch and plugging it in. In order to make that connection properly, you have to consider the effects of STP (Spanning Tree Protocol) and configure things properly ahead of time. Otherwise, you risk taking down your whole network by creating a network loop or ending up with poorly performing IP storage as a result.

In essence, STP is designed to prevent network loops from occurring. Without STP, if you were to create a loop by plugging three switches into each other (switch A to B, B to C, and C to A), any broadcast packet would race around that ring endlessly — quickly resulting in network saturation and a very bad day at the office. To avoid this, STP identifies and disables network links that would cause a loop. If enabled, STP will do a good job of preventing network loops without any attention from the network admin.

But be aware that STP may do so in a way you might not want. In the example above, if switches A and B are your core switches, attached with a 2Gbps link, appending a workstation switch (switch C) to both A and B could result in STP disabling the link between A and B to avoid the creation of a loop — thereby forcing all cross-switch traffic between A and B to flow through switch C. You definitely don’t want that.

The devil in the details here is how STP decides which link to block when a loop is detected. The first thing that STP does (continuously as the network topology is modified) is to elect a so-called root bridge — a switch that will act as the “root” of the network. After that, each nonroot switch on the network evaluates all of its available routes to get to the root bridge and will use the most direct path possible. If Switch C were to be elected the root bridge, the suboptimal configuration above might result.

Avoiding this requires setting appropriate root bridge priorities on switches A and B to ensure that, as long as either of them is active, one of them will be elected the root bridge. Switches generally ship with a default bridge priority of 32768, so it’s just an issue of setting switch A and B to something lower (higher preference). Switch A might be set to 4096 and switch B might be set to 8192 — ensuring one of them will always be the root. It’s also important to note that many switches implement PVST (Per VLAN Spanning Tree), which means you need to set these priorities for each VLAN you’ve created.

Putting it all together

IP storage is very easy to set up and run — in large part because it relies on the same type of networking gear that even the smallest businesses already have and use. But building a fully redundant, high-performance IP storage infrastructure requires greater attention to how that general-purpose networking gear is configured and deployed, often requiring admins to learn new tricks along the way.

Many organizations will rush headlong into deploying virtualization and IP storage without considering these factors — bad idea. Careful configuration is the difference between a solid, fault-tolerant infrastructure and one that is prone to failure or performs poorly. IP storage provides a great opportunity to get the most out of virtualization. To enjoy those benefits, do it right the first time.

This article, “Build a bulletproof network for IP storage,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.