Paul Krill
Editor at Large

Apache Hadoop to get more user-friendly

news
Jul 20, 20113 mins

The forthcoming Hadoop release, likely to be dubbed version 0.23, will also get improvements in availability, performance, and scalability

Relief is on the way for users of the open source Apache Hadoop distributed computing platform who have wrestled with the complexity of the technology.

A planned upgrade to Hadoop distributed computing platform, which has become popular for analyzing large volumes of data, is intended to make the platform more user-friendly, said Eric Baldeschwieler, CEO of HortonWorks, which was unveiled as a Yahoo spinoff last month with the intent of building a support and training business around Hadoop. The upgrade also will feature improvements for high availability, installation, and data management. Due in beta releases later this year with a general availability release eyed for the second quarter of 2012, the release is probably going to be called Hadoop 0.23.

“A big focus for us is going to be adding tools for monitoring and distributing and management, [with the goal of making it] much easier for organizations to use Hadoop. The problem now is it takes a pretty sophisticated operations staff to install and use it,” Baldeschwieler said during an interview at HortonWorks’s Silicon Valley offices this week. He formerly was vice president of Hadoop engineering at Yahoo, which has been instrumental in Hadoop development.

Version 0.23 also is set for improvements in availability, performance, and scalability. “That’s a big one for very large customers,” such as Yahoo and Facebook, Baldeschwieler said. Tending to single points of failure in Hadoop’s master nodes will be a goal.

Also, the new HCatalog data management software layer planned for Hadoop 0.23 will let users store data in a more traditional table style, enabling users to transparently move data between tools. It also yields benefits for the MapReduce programming model used with Hadoop. Currently, users can work with two higher-level languages on top of Hadoop — Pig and Hive — said Baldeschwieler. Pig and Hive have their own specialty data stores. “What HCatalog’s going to allow is for Pig and Hive and MapReduce itself to operate on one set of tables,” he said.

An Apache representative concurred that goals for Hadoop include improvements for high availability, data management, and user friendliness, but Apache would not confirm what will be in the next release or what the version number will be. Because of Hadoop’s culture of continuous beta releases, there has yet to be a formal 1.0 release, Baldeschwieler said. “There will come a point where we will want to call it 1.0 or 2.0.”

This article, “Apache Hadoop to get more user friendly,” was originally published at InfoWorld.com. Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter. For the latest developments in business technology news, follow InfoWorld.com on Twitter.

Paul Krill

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

More from this author