Bossie Awards 2013: The best open source big data tools

feature
Sep 17, 20137 mins

InfoWorld's top picks in the expanding Hadoop ecosystem, the NoSQL universe, and beyond

The best open source big data tools

MapReduce was a response to the limitations of traditional databases. Tools like Giraph, Hama, and Impala are responses to the limitations of MapReduce. These all run on Hadoop, but graph, document, column, and other NoSQL databases might also be part of the mix. Which big data tools will meet your needs? The number of options seems to be expanding faster than ever. 

Apache Sqoop

When you think of big data processing, you think of Hadoop, but that doesn’t mean traditional databases don’t play a role. In fact, in most cases you’ll still be drawing from data locked in legacy databases. That’s where Apache Sqoop comes in.

Sqoop facilitates fast data transfers from relational database systems to Hadoop by leveraging concurrent connections, customizable mapping of data types, and metadata propagation. You can tailor imports (such as new data only) to HDFS, Hive, and HBase; you can export results back to relational databases as well. Sqoop manages all of the complexities inherent in the use of data connectors and mismatched data formats.

— James R. Borck

Paradigm4 SciDB

SciDB is a distributed database system that leverages parallel processing to perform real-time analytics on streaming data. Built from the ground up to support massive scientific data sets, it eschews the rows and columns of relational databases for native array constructs that are better suited to ordered data sets such as time series and location data. Neither relational nor MapReduce, SciDB offers a unified solution that scales across large clusters without requiring Hadoop’s multilayered infrastructure and data massaging obligations.

— James R. Borck

Read about more open source winners