Serdar Yegulalp
Senior Writer

Apache Spark 3.0 adds Nvidia GPU support for machine learning

news
May 14, 20202 mins

The next major release of the in-memory data processing framework will support GPU-accelerated functions courtesy of Nvidia RAPIDS

sparking jumper cables 94261543
Credit: Thinkstock

Apache Spark, the in-memory big data processing framework, will become fully GPU accelerated in its soon-to-be-released 3.0 incarnation. Best of all, today’s Spark applications can take advantage of the GPU acceleration without modification; existing Spark APIs all work as-is.

The GPU acceleration components, provided by Nvidia, are designed to complement all phases of Spark applications including ETL operations, machine learning training, and inference serving.

Nvidia’s Spark contributions draw on the RAPIDS suite of GPU-accelerated data science libraries. Many of RAPIDS’ internal data structures, like dataframes, complement Spark’s own, but getting Spark to use RAPIDS natively has taken nearly four years of work.

Spark 3.0 speedups don’t come solely from GPU acceleration. Spark 3.0 also reaps performance gains by minimizing data movement to and from GPUs. When data does need to be moved across a cluster, the Unified Communication X framework shuttles it directly from one block of GPU memory to another with minimal overhead.

According to Nvidia, a preview release of Spark 3.0 running on the Databricks platform yielded a seven-fold performance improvement when using GPU acceleration, though details about the workload and its dataset were not available. 

No firm date has been given for general availability of Spark 3.0. You can download preview releases from the Apache Spark project website.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author