Spark picks up machine learning, GPU acceleration

news analysis

Oct 28, 20162 mins

Databricks perks up its cloud Spark service with easy access to GPU-accelerated machine learning libraries

Databricks, corporate provider of support and development for the Apache Spark in-memory big data project, has spiced up its cloud-based implementation of Apache Spark with two additions that top IT’s current hot list.

The new features — GPU acceleration and integration with numerous deep learning libraries — can in theory be implemented in any local Apache Spark installation. But Databricks says its versions are tuned to avoid the resource contentions that complicate the use of such features.

Apache Spark isn’t configured out of the box to provide GPU acceleration, and to set up a system to support it, users must cobble together several pieces. To that end, Databricks offers to handle all the heavy lifting.

Databricks also claims that Spark’s behaviors are tuned to get the most out of a GPU cluster by reducing the number of contentions across nodes. This seems similar to the strategy used by MIT’s Milk library to accelerate parallel processing applications, wherein operations involving memory are batched to take maximum advantage of a system’s cache line. Likewise, Databricks’ setup tries to keep GPU operations from interrupting each other.

Another time-saving measure is adding direct access to popular machine learning libraries that can use Spark as a data source. Among them is Databricks’ TensorFrames, which allows the TensorFlow library to work with Spark and is GPU-enabled.

Databricks has tweaked its infrastructure to get the most out of Spark. It created a free tier of service to attract customers still wary of deep commitment, providing them with a subset of the conveniences available in the full-blown product. InfoWorld’s Martin Heller checked out the service earlier this year and liked what he saw, precisely because it was free to jump into and easy to get started.

But competition will be fierce, especially since Databricks faces brand-name juggernauts like Microsoft (via Azure Machine Learning), IBM, and Amazon. Thus, it has to find ways to both keep and expand an audience for a service as specific and focused as its own. The plan appears to involve not only adding features like machine learning and GPU acceleration to the mix, but ensuring they bring convenience, not complexity.

AnalyticsData ManagementData ScienceArtificial IntelligenceOpen Source

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

Show me more

Topics

About

Policies

Our Network

More

Spark picks up machine learning, GPU acceleration

Databricks perks up its cloud Spark service with easy access to GPU-accelerated machine learning libraries

More from this author

I ran Qwen3.5 locally instead of Claude Code. Here’s what happened.

Migrating Python to Rust with Claude: What could go wrong?

First look: Electrobun for TypeScript-powered desktop apps

What I learned using Claude Sonnet to migrate Python to Rust

The best new features in MariaDB

Python’s popularity slip: Here’s what we know

What is Docker? The spark for the container revolution

First look: Run LLMs locally with LM Studio

Show me more

OpenAI’s desktop superapp: The end of ChatGPT as we know it?

Google’s Stitch UI design tool is now AI-powered

The ‘toggle-away’ efficiencies: Cutting AI costs inside the training loop

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)