matt_prigge
Contributing Editor

Simulate access to the cloud before you commit

analysis
Feb 25, 20139 mins

If you can't get a direct connection to the cloud, estimating user experience before migrating to the cloud is a critical task

A few months ago, I wrote about the benefits of getting a direct connection to the cloud. If you’re considering heavy usage of public cloud services (especially IaaS and hybrid IaaS), having a direct connection onto your cloud service provider’s network can make an enormous difference to the user experience. This is especially true for latency-sensitive applications, such as server-based computing (Terminal Services and Citrix VDI), and bandwidth-hungry use cases, such as data replication and cloud backup.

While you might be able to justify the cost — or even show a savings — of having a dedicated, high-bandwidth connection to a cloud service provider when running a large infrastructure, smaller businesses are much less likely to be able to make that math work. Instead, like the bulk of cloud users, most small businesses are forced to use commodity Internet access to reach resources they’ve decided to move into the cloud.

This may not be so bad if you have access to cheap, high-quality bandwidth or if your use case isn’t particularly latency- or bandwidth-sensitive (Web-based SaaS applications like Salesforce.com might fall into this category). If you’re not so lucky, it’s difficult to know ahead of time how your user experience will be impacted. The surest way to know is to try it, and thankfully, many cloud providers will give you a trial run of their services so that you can do so. Unfortunately, if you want an accurate answer, “trying it” could mean migrating entire on-premise applications into the cloud — a time-consuming process.

If you find yourself in that boat, there are still relatively accurate ways to get an idea of what to expect if you move on-premise services to the cloud. The two-part process involves first measuring the end-to-end quality of the connectivity you have into the cloud provider you expect to use, then simulating those conditions on your own network — without moving anything anywhere. (I’ll cover the second part in this column next week.)

Defining quality

A wide variety of factors affect the performance of traffic crossing the open Internet. These include geographical distance, logical network distance, traffic congestion, and bandwidth bottlenecks. These factors, in addition to outages and the limitations of your own last-mile Internet circuit, conspire to define the four major quality characteristics of any network connection: throughput, packet loss and delay, latency, and jitter.

Throughput. Throughput is a fairly straightforward metric that simply indicates how much data you can move from point A to point B in a given period of time. Depending on your use case, you may be more interested in how much raw data you can move from your premise to the cloud or vice versa, but in most instances, you’ll want to know both the premise-to-cloud and cloud-to-premise throughput figures. For example, in a cloud backup scenario, you’re primarily interested in your upstream (premise-to-cloud) throughput, but you’ll also want to know your downstream (cloud-to-premise) throughput in case you need to restore something.

Latency. If you’ve ever pinged anything, you’ve tested latency. Latency is the amount of time it takes a packet to reach its destination, then answer with a reply. This is often abbreviated to RTT (round-trip time) and is sometimes referred to as “delay.” With a properly optimized TCP stack, latency should have a relatively small effect on overall throughput.

But not all use cases are solely interested in how much data can be moved. Citrix ICA, Teradici PCoIP, and Microsoft RDP (all remote display presentation protocols used in server-based computing applications) are very latency-sensitive. Because these protocols allow you to access a desktop environment remotely, simple things like typing a letter on your keyboard and seeing the resulting character on the remote session’s screen are affected by latency. Very poor latency can render these applications challenging at best and entirely unusable at worst.

Packet loss and delay. Because the Internet is ever changing, neither throughput nor latency are stable, known figures. In the case of throughput, transient congestion or packet loss can cause short-term restrictions that will come and go without much explanation. In the case of latency, congestion and routing changes can substantially impact how much time your packets will be in flight before they reach their destination.

Jitter. With latency, there is a way to measure this variation in latency over a period of time — a quality called “jitter.” Jitter is the statistical dispersion of the latency results you see over a period of time, generally using the standard deviation of a set of round-trip times as a metric. A lower jitter indicates a stable amount of latency — a condition far preferable to the alternative. A very high jitter can result in inconsistent user experiences and packet reordering, which in turn can place a substantial penalty on throughput.

Testing quality

Before you can simulate the effects of these qualities on your own network, you need to see what the conditions on the Internet look like.

In most cases, it’s easiest to test the latency and jitter, as these don’t require special facilities at your cloud provider (they need only respond to ICMP packets). This is easily done using the ubiquitous ping tool. Ping sends an ICMP echo-request packet to a given address; the host on the far side (if it’s not configured to filter ICMP echo requests) then sends an echo-reply packet back. Ping simply measures the amount of time it took for the reply to arrive.

Here’s an example ping test:

PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=50 time=41.871 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=50 time=49.321 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=50 time=39.073 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=50 time=82.567 ms 64 bytes from 8.8.8.8: icmp_seq=4 ttl=50 time=38.342 ms 64 bytes from 8.8.8.8: icmp_seq=5 ttl=50 time=38.667 ms 64 bytes from 8.8.8.8: icmp_seq=6 ttl=50 time=39.308 ms 64 bytes from 8.8.8.8: icmp_seq=7 ttl=50 time=38.235 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=50 time=36.713 ms 64 bytes from 8.8.8.8: icmp_seq=9 ttl=50 time=82.776 ms --- 8.8.8.8 ping statistics --- 10 packets transmitted, 10 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 36.713/48.687/82.776/17.312 ms

In this ping result, the average latency is 48.7 milliseconds, and the jitter (standard deviation) in the results is 17.3 milliseconds. I saw no packet loss in this run. The target of this ping is one of Google’s public DNS servers; in your case, you want to measure the latency to a server on your cloud provider’s network, and you want to do so for much longer than 10 seconds — an entire business day might be wise.

Testing throughput is a bit more involved because it requires that you have two cooperating pieces of software running on both ends of the connection to accurately determine the maximum throughput. You have a variety of options. There are publicly available bandwidth testers on the Internet (Speedtest.net is one of the most popular). However, these tools really test only your last mile-connection: DSL, cable, fiber, T1s, whatever you have to your premise. As a potential cloud user, you’re interested in throughput from your premise to the cloud — not to a random bandwidth-testing server.

Often, the only way to really test this full network path is to get access to a server on the cloud provider’s network and install test software on it. Fortunately, this is fairly easy and cheap — sometimes free. With Amazon Web Services, for example, firing up a simple (and free, if you have a new account) t1.micro instance is quite effortless. Once you have access to the console of the instance, you need only download a bandwidth-testing tool, poke a hole in Amazon’s firewall for it (using a security group), then get the same testing software installed at your premises.

One well-known testing tool is iperf and its graphical front-end companion jperf. To test a CentOS Linux server in the cloud with iperf and jPef, I run yum install iperf, which downloads and installs the iperf package. Next, I start iperf using the command line iperf -s -w 256K, which starts iperf as a server and tells it to use a 256K window size. iperf uses a default port of 5001, but you can change this using the -p switch.

On a Windows workstation at my premises, I download a recent jperf package, which includes both the Java-based jperf front end and a Windows-compatible iperf command-line build. (Searching for “jperf windows” will find you a few precompiled versions, or you can get the source from SourceForge and compile it yourself.) Then I fire up jperf and tell it to run a multithreaded bandwidth test against my Amazon instance, as you can see in the screenshot below:

jperf

According to this short test, I’m getting a little less than 900Kbps of sustained throughput, which is pretty bad. You could easily use the command-line version of iperf and get these same results, but I like using jperf because it makes some throughput anomalies more obvious. In this case, I got about 2.5Mbps for the first second or two of the connection followed by fairly consistent sub-950Kbps throughput. This tells me I’m likely rate-limited — not terribly surprising, given that my “premise” for this test is a hotel room.

As with ping, if this were a real test, I’d run it for much longer. I’d also use either the dual/trade modes or test the connection in reverse from my server to the premise to accurately measure my upstream bandwidth.

What’s next? Now that I have my throughput (900Kbps), packet loss (0.0 percent), latency (48.7ms), and jitter (17.3ms) measured for the connection to my potential cloud home, I can move to simulating these conditions on my own premise network without actually moving anything anywhere.

Tune in next week for that!

This article, “Simulate access to the cloud before you commit,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.