Serdar Yegulalp
Senior Writer

IBM fires up Spark with Bluemix, machine learning contributions

news analysis
Jun 15, 20153 mins

IBM doubles up on Spark, adding it to Bluemix and contributing its SystemML machine-learning code to the Apache project

How much does IBM love Apache Spark? Enough to make it part of Bluemix — and enrich it with its own contributions.

These aren’t any old contributions: Among them is an IBM invention designed that makes it easy to deploy machine-learning algorithms across Spark clusters.

Work smarter, not harder

This contribution to Spark, known as IBM SystemML, was originally outlined in a 2011 paper. In it, IBM researchers described the creation of a high-level language, akin to the R language for statistics and data analysis, for authoring machine learning algorithms that could run easily at scale. Those algorithms are then “compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines,” according to the IBM paper.

But since 2011, MapReduce has gradually taken a backseat to the more efficient YARN processing methodology. Plus, Spark has grown in importance and utility, due in part to YARN. In the light of that, it’s wise for SystemML to look beyond MapReduce and become Spark-centric — precisely what IBM has in mind here.

IBM is still vague on how it plans to accomplish this. Its press release notes that it will be working jointly with Databricks, one of the major commercial contributors to the Spark project, “to advance Spark’s machine learning capabilities.” Most likely this will involve IBM submitting SystemML-related patches for Spark to Databricks, and collaborating with them on the implementation.

A big blue Spark

IBM’s other major Spark project is more predictable, but might be more immediately useful: Adding Spark processing as an IBM Bluemix service.

The details are still sketchy, though. In its press release, IBM describes Spark as a service on Bluemix as “[making] it possible for any app developer to quickly load data, model it, and derive the predictive artifact to use in their app.”

If Spark on Bluemix follows the same model as IBM’s previous as-a-service offerings, it will likely involve connecting a Spark-as-a-service instance to data stored in one of the existing Bluemix data management or big data offerings. Among the latter is BigInsights for Hadoop, IBM’s version of Hadoop in Bluemix.

Most intriguing will be IBM’s decision to keep Spark as its own independent service that can be freely coupled to items in the Bluemix catalog — or to components in a private cloud by way of IBM’s hybrid cloud solutions. The easy answer would be to rev BigInsights and add Spark, but IBM would be better served by expanding its ambitions and making Spark an ingredient that can be reused in multiple contexts.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author