matt_prigge
Contributing Editor

When WAN optimization really boosts network performance

analysis
Feb 4, 20138 mins

As more and more data moves across WANs and into the cloud, making the most of your WAN connections is key

Last week, I explained the growing importance of understanding TCP basics — specifically, TCP windowing. As companies deal with ever larger amounts of data that need to move across the WAN, whether between premises or between a premise and the cloud, it becomes increasingly important to optimize the basics to achieve the best performance. To ease this process, a variety of networking vendors sell so-called WAN accelerators. However, they’re no panacea, suitable to only certain environments.

The truth is that these vendors and their customers are often plagued by a general lack of understanding about what the devices can and can’t do. You might even assume WAN accelerators to be the snake oil of the Internet age — surely you can’t get more than 10Mbps of throughput on a 10Mbps circuit, right? On the other hand, you often hear the notion that WAN accelerators are techno-magic devices that can make anything faster. As is usually the case, neither extreme turns out to be true.

WAN accelerators are great in some situations and at best unhelpful in others. Furthermore, not all WAN acceleration devices and software stacks do the same things. Read on to understand when WAN accelerators really deliver faster WAN networking.

WAN acceleration basics

The first thing to understand is that most WAN acceleration devices are deployed in pairs, one at each of the sites between which the bandwidth is to be optimized. You need a device at each site because WAN acceleration devices optimize traffic in a way that renders such traffic unintelligible to the original destination. It’s like encryption, where you need a companion decryptor at the other end to make the encrypted data intelligible. Thus, a WAN accelerator must be present at the receiving end to return the network traffic to its original form.

When you’re trying to accelerate network traffic between two sites, you’ll have these WAN accelerators at each gateway. But you can also implement WAN acceleration on users’ PCs, to accelerate the traffic from the office to wherever they are. Likewise, for acceleration between your corporate network and the cloud, you’d have a WAN accelerator deployed in the cloud service as a virtual appliance or use the native WAN acceleration support offered by some cloud service providers.

In these scenarios, you need a WAN accelerator on each end of the connection. That means WAN acceleration will do little to nothing to speed up general Internet access. It’s meant for speeding up specific network connections.

Protocol optimization

One of the most basic features a WAN accelerator can implement is TCP optimization. WAN accelerators can overcome inappropriate endpoint TCP configurations by terminating the TCP connections crossing them locally at each end and using separate, optimally configured connections in between instead.

Optimizations typically include using very large TCP windows through receive-side scaling, appropriately implemented selective acknowledgements, and optimized congestion-avoidance response mechanisms.

Selective acknowledgements let the destination station acknowledge portions of a TCP window without having received all of it. When you’re using very large TCP windows, selective acknowledgement prevents the entire window from having to be resent if there is packet loss. It can also prevent the entire window from being “filled,” so the sending station can transmit data constantly — diminishing the impact of link latency.

Optimizing congestion-avoidance mechanisms can include much more aggressive responses to packet loss, and window rescaling can decrease the impact of small amounts of congestion-related packet loss.

What’s key is that these optimizations can be used regardless of whether the endpoints on either side of the connection are configured to implement them. Because the devices are deployed in pairs, they will negotiate those optimizations themselves. Thus, that Windows Server 2003 machine with the default 64KB maximum window size I used in my example last week could send and receive data as if it was configured for a much larger window size, if it and the connection at the other end both used WAN acceleration.

These optimizations can work extremely well when the connection between the appliances is highly reliable and where packet loss is seldom seen. But if the connection is frequently unreliable, the optimizations can result in strange endpoint and application behavior. In many cases, from the sending endpoint’s perspective, data appears to have already been received by the receiving endpoint. But in reality, the local acceleration appliance has sent acknowledgements to the sending endpoint to convince it to transmit more data. If packet loss occurs between the appliances and results in a connection timeout, an application could assume that the receiving endpoint has collected the data — even when it hasn’t. You may not be able to use WAN acceleration for applications sensitive to such packet-loss scenarios.

Protocol translation

When simply optimizing TCP isn’t enough, some WAN accelerators will translate TCP data flows into a different protocol between the appliance pairs. As in the simple connection optimization examples, the appliances will terminate TCP connections locally at each site, but instead of speaking optimized TCP between each other, they translate the data flow into a connectionless protocol such as UDP.

This isn’t a trivial process, because TCP is designed as a loss-tolerant protocol whereas stateless protocols like UDP are not. Vendors using this approach must design their own proprietary means to recognize that data has been lost in transit and needs to be re-sent. The payoff is that vendors who figure out how to do so can manage to squeeze better performance from WAN links using this approach in those scenarios where TCP optimization is problematic.

Network deduplication

In situations where large amounts of similar or inefficiently packaged data are shipped across a WAN, it can be beneficial to deploy WAN acceleration devices that deduplicate the data stream. In practice, this requires appliances with (usually large) amounts of fast storage and a substantially larger software engineering challenge. However, when it’s done right and is combined with a data flow that can benefit from it, the results can be spectacular.

As packets cross the appliance on the sending side of the connection, the appliance calculates a hash of the data, then stores that hash locally. It then compresses the data payload and sends it across the WAN to its partner on the receiving side. That appliance stores the entire data payload and the hash in its cache; from there, it’s sent on to the receiving endpoint. If the sending appliance sees a packet with the same payload hash in the future, it simply sends the hash (and not the payload) to the receiving side appliance. Finally, that appliance copies the matching payload from its cache and sends it to the receiving endpoint.

In cases where data hasn’t been seen before, this approach won’t yield big results. However, if similar data is sent repeatedly, it can have a huge impact on WAN utilization.

Does it actually work?

Nobody likes to hear the answer as to whether WAN optimization works in the real world, but it’s the truth: It depends.

The variable you have to carefully consider is what kind of traffic you’re trying to optimize. If the traffic is real-time data such as VoIP or remote display protocols like RDP and ICA, traffic optimization and deduplication aren’t likely to have any real effect. However, if the data flow is bulk data — especially repetitive bulk data — a deduplicating WAN accelerator can magnify the observed throughput of a WAN circuit many times over.

A particularly good example of this is inefficient IP-based SAN replication. Dell’s EqualLogic SANs are a good example of where WAN acceleration can help: I’ve seen a good WAN accelerator handle what would have been a constant 100Mbps with less than 10Mbps of actual WAN bandwidth (a multiple making even pricey WAN accelerators worth the investment).

However, many other SAN platforms have implemented the bulk of the TCP optimizations, compression, and deduplication that most WAN accelerators bring to the table. Adding a WAN accelerator for these platforms won’t yield much of a benefit — it’s already embedded in the SAN platforms.

Putting it all together

In cases where you have bulky or repetitive (or ideally both) data flows, WAN accelerators can provide a huge benefit and save you a lot of money. But where you don’t have the right kinds of data flows or the appliance you choose isn’t optimized to handle that particular kind of data, you may not derive much benefit at all.

The key to evaluating a WAN accelerator is to actually evaluate it on your network with your data. Most vendors will allow you to try an appliance (virtual or physical) before you buy so that you can accurately gauge its value to you. Don’t be afraid to take advantage of that policy!

This article, “When WAN optimization really boosts network performance,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.