IBM fires up Spark with Bluemix, machine learning contributions

news analysis

Jun 15, 20153 mins

IBM doubles up on Spark, adding it to Bluemix and contributing its SystemML machine-learning code to the Apache project

How much does IBM love Apache Spark? Enough to make it part of Bluemix — and enrich it with its own contributions.

These aren’t any old contributions: Among them is an IBM invention designed that makes it easy to deploy machine-learning algorithms across Spark clusters.

Work smarter, not harder

This contribution to Spark, known as IBM SystemML, was originally outlined in a 2011 paper. In it, IBM researchers described the creation of a high-level language, akin to the R language for statistics and data analysis, for authoring machine learning algorithms that could run easily at scale. Those algorithms are then “compiled and optimized into a set of MapReduce jobs that can run on a cluster of machines,” according to the IBM paper.

But since 2011, MapReduce has gradually taken a backseat to the more efficient YARN processing methodology. Plus, Spark has grown in importance and utility, due in part to YARN. In the light of that, it’s wise for SystemML to look beyond MapReduce and become Spark-centric — precisely what IBM has in mind here.

IBM is still vague on how it plans to accomplish this. Its press release notes that it will be working jointly with Databricks, one of the major commercial contributors to the Spark project, “to advance Spark’s machine learning capabilities.” Most likely this will involve IBM submitting SystemML-related patches for Spark to Databricks, and collaborating with them on the implementation.

A big blue Spark

IBM’s other major Spark project is more predictable, but might be more immediately useful: Adding Spark processing as an IBM Bluemix service.

The details are still sketchy, though. In its press release, IBM describes Spark as a service on Bluemix as “[making] it possible for any app developer to quickly load data, model it, and derive the predictive artifact to use in their app.”

If Spark on Bluemix follows the same model as IBM’s previous as-a-service offerings, it will likely involve connecting a Spark-as-a-service instance to data stored in one of the existing Bluemix data management or big data offerings. Among the latter is BigInsights for Hadoop, IBM’s version of Hadoop in Bluemix.

Most intriguing will be IBM’s decision to keep Spark as its own independent service that can be freely coupled to items in the Bluemix catalog — or to components in a private cloud by way of IBM’s hybrid cloud solutions. The easy answer would be to rev BigInsights and add Spark, but IBM would be better served by expanding its ambitions and making Spark an ingredient that can be reused in multiple contexts.

RoboticsData ManagementOpen SourceTechnology Industry

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

Show me more

Topics

About

Policies

Our Network

More

IBM fires up Spark with Bluemix, machine learning contributions

IBM doubles up on Spark, adding it to Bluemix and contributing its SystemML machine-learning code to the Apache project

Work smarter, not harder

A big blue Spark

More from this author

I ran Qwen3.5 locally instead of Claude Code. Here’s what happened.

Migrating Python to Rust with Claude: What could go wrong?

First look: Electrobun for TypeScript-powered desktop apps

What I learned using Claude Sonnet to migrate Python to Rust

The best new features in MariaDB

Python’s popularity slip: Here’s what we know

What is Docker? The spark for the container revolution

First look: Run LLMs locally with LM Studio

Show me more

How to land a software development job in an AI-focused world

The agent security mess

OpenAI’s desktop superapp: The end of ChatGPT as we know it?

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)