Paul Krill
Editor at Large

Apache Spark jumps on the R bandwagon

news
Feb 23, 20152 mins

The big data processing technology looks to draw in data scientists

Apache Spark, the big data processing technology for iterative workloads that is growing in popularity, is about to add capabilities for DataFrames and the R language as part of two upcoming upgrades.

Spark in 2015 is focusing on data science and platform interfaces, said Matei Zaharia, who started the Spark project and is currently CTO at big data service provider Databricks, which is involved in Spark development. Increasingly, the people who want to use Spark “are not just software developers but they’re data scientists, maybe experts in other fields who need to run computations on large data,” Zaharia said at the Strata+Hadoop World conference in San Jose, Calif., late last week.

“The most exciting thing that we’re doing [in data science] is adding DataFrames to Spark,” said Zaharia. Due in Spark 1.3 in a couple weeks, DataFrames features common APIs for working with data on a single machine, providing a concise way to write expressions to do operations on data. Meanwhile, Spark 1.4, expected in June, will feature an R interface, thus backing Scala, Python, Java, and R —  the “four most-popular languages for big data today,” he said.

Spark already features libraries for SQL, streaming, and advanced analytics, but the goal for the future is to create platform interfaces to extend Spark on a wide range of environments, such as NoSQL and traditional data warehouse environments, according to Zaharia.

Also in the Spark space, Databricks and Intel are collaborating to optimize Spark real-time analytic capabilities for Intel’s architecture. “We believe Spark’s efficient in-memory computation within Hadoop enterprise data hub, combined with the performance of Intel Architecture, enables advanced analytics with faster real-time decisions,” Michael Greene, vice president of the Intel Software and Services Group, said in a blog post on Friday.

Paul Krill

Paul Krill is editor at large at InfoWorld. Paul has been covering computer technology as a news and feature reporter for more than 35 years, including 30 years at InfoWorld. He has specialized in coverage of software development tools and technologies since the 1990s, and he continues to lead InfoWorld’s news coverage of software development platforms including Java and .NET and programming languages including JavaScript, TypeScript, PHP, Python, Ruby, Rust, and Go. Long trusted as a reporter who prioritizes accuracy, integrity, and the best interests of readers, Paul is sought out by technology companies and industry organizations who want to reach InfoWorld’s audience of software developers and other information technology professionals. Paul has won a “Best Technology News Coverage” award from IDG.

More from this author