High-bandwidth replication over long distances, whether to a hot site or the cloud, requires a solid grasp of TCP to steer clear of bottlenecks Following Hurricane Sandy, let’s say you’ve been asked to set up replication to a disaster recovery site. Your company has chosen to back up its core operations located in Boston with space in a collocation center in Chicago — about a thousand miles away. You’ve done the math and determined that you’ll need a 500Mbps circuit to handle the amount of data necessary to replicate and maintain recovery-point SLAs. As you get your Chicago site and connectivity lit up, you decide to test out your connection. First, a ping shows that you’re getting a roundtrip time of 25ms — not horrible for such a long link (at least 11ms of which is simple light-lag). Next, you decide to make sure you’re getting the bandwidth you’re paying for. You fire up your laptop and FTP a large file to a Windows 2003 management server on the other side of the link. As soon as the transfer finishes, you know something’s wrong — your massive 500Mbps link is pushing about 21Mbps. Do you know what’s wrong with this picture? If not, keep reading because this problem has probably affected you before without your realizing it. If you decide to move to the cloud or implement this kind of replication, it’s likely to strike again. First, understand that the answer is related to Transmission Control Protocol (TCP), one of the two main IPs that most applications use to communicate over the Internet. (The other is User Datagram Protocol, or UDP.) What matters here is that TCP has built-in congestion and packet-loss detection capabilities whereas UDP does not. That detection makes TCP a great choice when you need to transfer data in a reliable, ordered fashion; UDP is a good choice when you have very small amounts of data to send and you don’t care precisely what order it’s received in or whether some is lost in transit (or have other application-layer means of dealing with these events). TCP is used for protocols like HTTP, FTP, most kinds of IP-based SAN replication, and Windows file sharing (SMB), while UDP is commonly used for DNS, VoIP, and some remote-display protocols like PCoIP. TCP’s reliability introduces throughput limitations TCP ensures that no data is lost by building a stateful connection from the client to the server. Whenever data is sent from one to the other, the receiving station acknowledges that the data has been received. This allows TCP to detect that a packet has been lost — ensuring that the sending side knows to resend it. This is great from a reliability standpoint, but it presents a potential performance problem: If the sending station has to wait for the receiving station to acknowledge every packet it sends, performance could be dramatically reduced. In my Boston-Chicago example, the laptop would have to wait 25ms every time it sent a packet with a 1,460-byte payload — resulting in a throughput of only about 4.6Mbps. TCP windowing is the answer Fortunately, TCP has a way to work around this problem. Instead of sending an acknowledgement every time a packet is received, the receiving station sends an acknowledgement for each collection of packets when their sizes add up to a limit called the TCP window. If the TCP window were set at 64KB, the sending station could send up to 64KB worth of packets without receiving an acknowledgement from the receiving station — reducing the slowdown caused by the packet acknowledgments. The size of the TCP window is variable, which is the key to TCP’s ability to deal with congestion on the open Internet. When two network stations start a conversation, the window starts very small (perhaps the size of a single packet, though usually larger). Each time a window’s worth of data is sent successfully, the window size is doubled. This process continues until either the maximum window size is reached or packets are lost. If just a couple of packets are lost, the window size is halved, then increased linearly until loss is detected again. This is called congestion-avoidance mode. If a lot of packets are lost in a row, the whole process restarts. In the Boston-Chicago example, what limited the throughput to 21Mbps? It was the fact that most Windows systems have a default maximum TCP window size of 64KB. If the sending station (your laptop, in the example) has to spend 25ms waiting for an acknowledgement after sending every 65,535 bytes worth of data, 21Mbps is the maximum throughput it can achieve — regardless of how large the circuit is or whether it’s congested. Here’s the formula: [ TCP Window in Bytes ] * 8 / [ Latency in Seconds ] = [ Maximum Throughput in Bits per Second ] To combat this, the RFC1323 standard defines a method of providing a binary multiplier for the originally 16-bit TCP window size so that TCP windows can be scaled up to 1GB — providing a large-enough window to easily saturate a 10Gbps Ethernet connection with a single TCP session. However, this feature, called TCP window scaling, is not always turned on — though it is usually supported on modern operating systems and networking gear. In the case of Windows Server 2003, you need to manually modify a registry key to raise the maximum TCP window size and thus get any benefit from window scaling. To perfectly use a 500Mbps link with 25ms latency, you need a max TCP window size of around 1,562,500 bytes (about 24 times the Windows 2003 default). Here’s the formula: [ Latency in Seconds ] * [ Desired Throughput in Bits per Second ] / 8 = [ Ideal TCP Window in Bytes ] Ideally, the window should be rounded to a multiple of the maximum segment size — typically 1,500 bytes. Why didn’t you already know this? Most IT pros who aren’t networking experts typically don’t know the inner workings of TCP. That’s usually not a big deal — when you’re working on a LAN with submillisecond latencies or on a WAN with bandwidths of less than 20Mbps, you rarely benefit from changing the default windowing settings. However, it is worth mentioning that the default 65,535-byte window setting on most Windows OSes limits throughput to 524Mbps even with a latency of just 1ms. That means TCP windowing can have an effect on throughput in gigabit LANs — perhaps best seen during high-throughput, single-connection applications such as backups. In an age where more and more applications are being moved to the cloud and where WAN connections sport larger and larger bandwidths thanks to fiber, basics such as TCP windowing can make an enormous difference to how well your systems work. Thus, they are suddenly important to IT generalists and networking specialists alike. If you’re not all that familiar with the inner workings of TCP/IP or might not know your way around a protocol analyzer, make a point to learn. You’ll at least find it interesting, and chances are it will be valuable to you sooner rather than later. This article, “All IT pros need to understand TCP windowing,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter. Technology Industry