matt_prigge
Contributing Editor

Your fateful decision: NFS or iSCSI?

analysis
May 29, 20126 mins

The two predominant IP storage protocols have wildly different strengths and weaknesses. Is one really better than the other?

Over the last several weeks, I’ve delved into forgotten aspects of building an IP storage network and how to best leverage it with both NFS and iSCSI — the two dominant IP storage protocols used in virtualization. Throughout that time, I’ve received a bunch of queries from readers, all united by one question: Which is better, NFS or iSCSI?

As with many hotly debated IT subjects, the choice between any two popular competing technologies is less about which is better overall and more about which is best for solving the challenge at hand. NFS and iSCSI are no different. Both have strengths and weaknesses depending on the situation. But the future of storage — which will make geographically diverse storage clustering a reality — may significantically factor into your choice of protocol.

File vs. block As I mentioned in last week’s post, NFS and iSCSI couldn’t be much more different, either in their implementation or history. NFS was developed by Sun Microsystems in the early 1980s as a general-purpose file sharing protocol that allowed network clients to read and write files to a server across a network. iSCSI came along much later, in the early 2000s, as an IP-based alternative to Fibre Channel — which, like Fibre Channel, encapsulates block-level SCSI commands and ships them across a network.

The key difference is where the file system is implemented and managed. In file-level implementations such as NFS, the server or storage array hosts the file system, and clients read and write files into that file system. In block-level implementations such as iSCSI and Fibre Channel, the storage array offers up a collection of blocks to the client, which then formats that raw storage with whatever file system it decides to use.

Though this distinction has many ramifications, perhaps the most important is that in block-level protocols such as iSCSI (and Fibre Channel), the storage array generally isn’t aware of what it is storing. All it knows is that it has allocated a collection of blocks and which iSCSI client(s) might have access to them. Conversely, in file-based protocols such as NFS, the storage array has full visibility to all of the application data stored on it — whether that’s general file sharing data or the files that might make up a collection of virtual machines.

From a practical perspective, this array-side knowledge of the data being stored in NFS-based deployments makes it easier for the storage array to track actual storage usage — in thin provisioning, being able to take snapshots or backups of individual virtual machines and even array-side deduplication of primary storage data.

But recent SCSI T10 enhancements as implemented in VMware’s VAAI (vSphere APIs for Array Integration) add similar functionality to block-based storage through support for the UNMAP SCSI primitive, which enhances the virtualization stack’s ability to free unused blocks (thus allowing the array to reclaim them) and array-side copy offloads, for accelerating tasks such as virtual machine cloning. In a sense, some of the intelligent hypervisor-to-array integration already possible on file-level systems using NFS is being grafted onto block-level implementations through extensions to the SCSI protocol.

Yet I’m certain that’s not where the file-vs.-block story will end. As much as adding those SCSI primitives allows SCSI-based storage protocols to perform some of the same tricks as NFS, in other situations a file-level protocol still has an edge — stretched clusters being a good example. In these kinds of synchronously replicated, geographically diverse storage implementations, giving the storage layer the ability to treat each virtual machine as a separate storage resource — to be moved and failed over individually rather than as huge opaque groups — will be vitally important and potentially very difficult to accomplish using block-based storage protocols.

On the network NFS and iSCSI are also significantly different from a networking perspective. With NFS, additional throughput and redundancy are achieved primarily through network-based link aggregation and careful attention to balancing storage connections over multiple array-side IP address aliases to ensure the load balancing is effective. iSCSI, on the other hand, has built-in multipathing capabilites and, when used with vendors that provide support for it, can supply more advanced load balancing algorithms that can balance storage traffic intelligently over many server and array-side storage paths.

In both cases, the use of 10Gbps Ethernet can lessen the importance of multipathing for storage performance reasons for the vast majority of organizations for whom throughputs approaching 1GBps are simply unthinkable (at least today). However, iSCSI retains an edge over NFS in this area — especially when aggregating multiple 1Gbps Ethernet links.

From a network security standpoint, iSCSI also has an edge. In addition to source-IP based security restrictions that both NFS and iSCSI support, iSCSI has built-in support for bidirectional challenge handshake authentication protocol (CHAP), which prevents unauthorized servers from attaching to storage resources and allows servers to validate the authenticity of the storage array they’re connecting to.

One common misconception about modern NFS implementations is that they are UDP/IP based. This often springs from the fact that NFS version 2 was — except for a few custom implementations — entirely UDP-based. While UDP is a relatively low-latency IP transport, it also lacks the security and delivery assurance benefits that the stateful connection tracking present in TCP/IP offer. Starting in NFS version 3, TCP became a supported transport. This is what most NFS-based storage arrays and hypervisors, such as VMware vSphere, use today, putting NFS on par with the TCP/IP-based iSCSI.

Looking toward the future Today, iSCSI would seem to be the clear winner — at least from a networking perspective, because it delivers better multipathing support and a higher degree of end-to-end security. Yet NFS retains a significant advantage when it’s leveraged properly on the array side, because it gives the array visibility into what the virtualization stack is doing with its storage and can intelligently participate in accelerating, snapshotting, and deduping that storage. It may be that those array-based intelligence benefits, combined with multipathing and security improvements coming to NFS client implementations when NFS 4.1 arrives, end up tipping the scales in NFS’s favor over the long run.

Don’t allow yourself to be convinced that either one of these protocols flat out beats the other. The fact that they have differences is good, because you have twice the chance of finding an IP-based solution that nails your specific needs. Remember: The final chapters of these protocols’ history have yet to be written.

This article, “Your fateful decision: NFS or iSCSI?,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.