When network problems seem to defy the laws of physics, you'd better know your way around the dark region below layer 3 Long, long ago, before I could tell the difference between a /20 and a /30, a mentor sat me down and asked me what I knew about Ethernet and networking in general. Back then, I wasn’t familiar with much besides configuring an IP address, subnet mask, and gateway in a server or desktop. The rest was still magic to me.He then spent the next 30 minutes revolutionizing my life with a succinct and accurate description of exactly what happens when an Ethernet frame is constructed, dropped on the wire, transmitted, and received. He discussed the SYN-ACK-SYN handshake, RST packets, collisions (this was in the dark days of 10Base-2), and duplex settings. We moved on to IP, VLSM, ports, and sockets — the whole shebang. I retained maybe 20 percent of what he said, but immediately developed a thirst to learn the rest. I didn’t know it at the time, but my future skills at designing and building large networks started right there, that day.[ Also on InfoWorld.com: Read Paul Venezia’s classic, “Nine traits of the veteran Unix admin.” | Or see if you qualify for the title of certified IT ninja. | Get a $50 American Express gift cheque if we publish your tech tale from the trenches. Send it to offtherecord@infoworld.com. ] Over the intervening years, I’ve done the same with a few younger folks who showed an interest in and aptitude for networking. But it seems to me that more network people skip the lower levels of networking knowledge and rely on their understanding of layer 3 alone. The ability to accurately calculate IPv4 subnet masks might be the limit of their abilities; what actually happens on the wire is a big gray area. In many cases, this also includes the supporting players of IP, such as ARP. When presented with packet trace output in Wireshark, they’re lost.The truth of the matter is that you can be very successful and build functional networks without ever knowing what ARP is or why GARP even exists. An understanding of basic TCP ports, NAT, and IP subnetting goes a long way in the IT world these days. Those skills are generally enough for you to construct viable firewall rules, spot an invalid subnet mask setting that’s causing problems, and so forth. But when a problem goes out of that scope, you don’t have the tools to dig deeper.The bottom line is that you need to be able to read and dissect packet traces if you want to consider yourself a bona fide network troubleshooter. Consider this bizarre networking problem related to VMware that I dealt with recently: When a Windows or Linux host was placed on a VLAN, communication with other hosts was fine. When an ESXi host was placed on the same network, there were problems communicating at a Layer 2 level. Even a Linux VM running on the same ESXi host had no problems and displayed an accurate ARP table, yet the ARP table on the ESXi box itself had the wrong MAC addresses for certain (but not all) hosts on the same segment. It was quite the head scratcher and specifically prevented the ESXi host from using port binding. When port binding was disabled, communications with other hosts functioned, but the ARP table was still wrong, which was perplexing.After making doubly sure that other hosts had no L2 communication issues and included a valid ARP table, it appeared to be a problem with ESXi, since I could not reproduce this problem on Windows or Linux. I shot a few packet traces and sent them off to VMware, but I didn’t have time to look at them right away. On a subsequent call with VMware networking gurus, we pored over the traces and found that for each ARP who-has from a host on that subnet, there was an ARP is-at reply from the actual referred host. However, close on its heels was an is-at reply from the VRRP virtual IP MAC address claiming that the IP address in question was actually at the MAC address assigned to the VRRP virtual IP itself.On its face, it looked like the L3 switches were claiming every IP on that subnet for themselves. Naturally, this could cause problems and was causing problems with ESXi. But remember — Linux and Windows hosts on the same segment had the right MAC addresses in their ARP table, only the ESXi boxes had the VRRP MAC address in their tables for other hosts on the segment. This caused me to wonder why on earth the VRRP spec would include a provision for proxy ARP, which then immediately led me to realize that it doesn’t — but proxy ARP is designed to do exactly that. A brief check of the bowels of the L3 switch showed that proxy ARP was enabled for that VLAN and only that VLAN. (Full disclosure: I’d enabled it for some tests a few weeks ago since that lab segment is rarely used. In my advancing years, I’d simply forgotten I’d done it.) The reason that only ESXi exhibited this problem is that while Linux, Windows, and most other OSes place the first ARP is-at reply into their ARP tables, ESXi chooses the last response. Since proxy ARP is artificially generated, it’s usually a few milliseconds behind the actual ARP is-at response from the host itself.The moral of this story is that had I jumped on the packet traces right away, the fix would have been apparent much sooner. Another moral is that VMware should really change the ARP table population code in ESXi to conform with all other modern OSes and discard all but the first ARP reply. After all, it does provide some form of protection against ARP cache poisoning, and the first response is usually the closest and most accurate.If you’re reading this and haven’t spent much time digging into what’s actually happening on the wire, maybe it’s time to download Wireshark, take a snapshot of some traffic, and go through it to familiarize yourself with what goes on in the depths of an Ethernet network. The time you spend doing so will be repaid with interest down the line. Basic networking has become so “easy” that most people view it as a dark art, though it holds the key to solving myriad networking problems. The few times that I’ve forgone pulling tcpdumps when troubleshooting network problems have been the times that the answer would have been right in front of my face.This story, “The lost art of reading packet traces,” was originally published at InfoWorld.com. Read more of Paul Venezia’s The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter. Technology IndustryIT Skills and TrainingApp TestingUtilities