Spark speeds up SQL on Hadoop in Splice Machine 2.0

news analysis

Nov 17, 20153 mins

Apache Spark is being used to enhance yet another third-party data processing product

Apache Spark rose to fame as an in-memory data processing framework frequently used with Hadoop, but it’s fast transforming into a nucleus for building other data-processing products.

Newly released, version 2.0 of the SQL-RDBMS-on-Hadoop solution Splice Machine uses Spark as one of two processing engines. Incoming work is divided between them depending on whether it’s an OLTP or OLAP workload.

Splice Machine originally made a name for itself as a replacement for multiterabyte workloads on conventional ACID RDBMS solutions like Oracle. The company claimed it enabled workloads for one former Oracle customer to run an order of magnitude faster, and Hadoop’s native scale-out architecture meant the solution could grow with the size of workloads at a lower cost than with a conventional RDBMS.

Monte Zweben, co-founder and CEO of Splice Machine, stated in an interview that Splice Machine’s big innovation is that it allows OLTP and OLAP workloads to run side by side using the same data and same architecture, but with different processing engines, making it easier to make business decisions with the data.

“We have an architecture that can identify whatever query comes into the system, determine whether it’s OLTP or OLAP, and send the query to the right computational engine,” Zweben said. Transactional queries are run under HBase; OLAP queries are processed via Spark. This also allows memory and CPU usage for each kind of query to be kept segregated.

Adding Spark to Splice Machine may have been been inevitable, as Mike Franklin, one of the company’s advisory board members and a chair in the computer science department at UC Berkeley, is director of AMPLab, where Spark originated.

Spark’s original aim was to provide data scientists with an easy way to perform the kinds of data processing that used to require a lot of code. Spark’s already been used to rewrite IBM’s DataWorks data-transformation product. In Splice Machine’s case, though, it adds entirely new functionality rather than simply enhancing the product.

Spark notwithstanding, Splice Machine faces stiff competition in a field that is growing more crowded by the minute. The database field offers a wealth of possibilities — NoSQL, NewSQL, and in-memory processing — many of which are designed to satisfy extremely specific use cases at high speed. Existing database vendors like Microsoft, Oracle, and Postgres are all upping their games to compete with NoSQL and in-memory DB offerings, and Hadoop vendors are spicing up their distributions to satisfy the need for fast analytics results.

While one of Splice Machine’s selling points is that it allows the reuse of existing ANSI SQL, the compatibility and speed issues with SQL-on-NoSQL solutions will become easier to surmount with time.

Data ManagementDatabases

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

Show me more

Topics

About

Policies

Our Network

More

Spark speeds up SQL on Hadoop in Splice Machine 2.0

Apache Spark is being used to enhance yet another third-party data processing product

More from this author

I ran Qwen3.5 locally instead of Claude Code. Here’s what happened.

Migrating Python to Rust with Claude: What could go wrong?

First look: Electrobun for TypeScript-powered desktop apps

What I learned using Claude Sonnet to migrate Python to Rust

The best new features in MariaDB

Python’s popularity slip: Here’s what we know

What is Docker? The spark for the container revolution

First look: Run LLMs locally with LM Studio

Show me more

How to land a software development job in an AI-focused world

The agent security mess

OpenAI’s desktop superapp: The end of ChatGPT as we know it?

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)