In the Grid community, there’s a popular expression that “access to the data is as important as access to the compute resources.”And no Globus Toolkit subcomponent is more central to Grid data access issues than the Globus implementation of GridFTP (File Transfer Protocol). Globus’ GridFTP is a high-performance data transfer protocol and software suite optimized for the gamut of data access issues — from bulk file transfer, to the nitty gritty details of getting the data out of complex storage systems within virtual organizations on the Grid, and pretty much every data requirement in between.The FTP protocol actually originated with the ARPAnet community all the way back in 1971 (here’s a link to a good historical synopsis). FTP has seen many new specification twists and turns through the years. In 1973, the Internet Engineering Task Force received a number of initial ‘requests for comment’ (RFCs) for FTP specs. The version that perhaps signaled the maturity of the protocol arrived in 1985, when Jon Postel and Joyce Reynolds (of ISI) authored RFC 959. RFC 959 included extensions to FTP to further “1) promote sharing of files (computer programs and/or data), 2) to encourage indirect or implicit (via programs) use of remote computers, 3) shield a user from variations in file storage systems among hosts, and 4) to transfer data reliably and efficiently.” FTP became a pervasive protocol with the arrival of the commercial Internet. But as Grid computing usage accelerated in e-Science in the late 90’s, new challenges arose for Grid users who needed to access different storage systems between virtual organizations. Storage systems had become increasingly customized to serve specific user needs — and the FTP protocol in its existing form was unable to reconcile this explosion of incompatible disparate systems for accessing data.So in 2001, the GGF and Globus Alliance authored the GridFTP protocol, which better navigates different types of storage systems, has a number of compelling new parallel and striped data transfer capabilities, and includes various new instrumentation and TCP buffer features. Today, by default, the Globus implementation of GridFTP will work on any storage device that has a POSIX file system for the storage, and TCP/IP for the network.“It doesn’t matter whether you’re running RAID or not, EXT3 versus XFS, PVFS or GPFS,” said Bill Allcock, technology coordinator at Argonne National Laboratory, and one of the authors of GridFTP, both the protocol (developed in the GGF) and the Globus implementation. “We work fine on all of those. The one caveat is that certain configuration parameters can have a much larger impact on some of those than the others. For instance, GPFS wants big reads — they want large sequential reads. Whereas PVFS wants you to match whatever the stride size is. But Globus GridFTP will work on all of them, just out of the box, regardless of system type.”Today, Globus GridFTP has pervasive use in the e-Science Grid community. The high energy physics community in particular has been a huge user from the start. A notable recent use was by the Relativistic Heavy Ion Collider (RHIC) community in Brookhaven – who used Globus GridFTP to sustain 600 megabytes per second of data transfer (from Long Island, New York, to Japan) over 11 days. For the British Broadcasting Corporation (BBC), their frequent large file demands (the typical broadcast hour today requires 280 GB for all pre-processed media streams), are met by GridFTP. Here’s a link to some compelling work they’re doing with the Belfast e-Science Centre for that effort. Technology Industry