Serdar Yegulalp
Senior Writer

Spark speeds up SQL on Hadoop in Splice Machine 2.0

news analysis
Nov 17, 20153 mins

Apache Spark is being used to enhance yet another third-party data processing product

Apache Spark rose to fame as an in-memory data processing framework frequently used with Hadoop, but it’s fast transforming into a nucleus for building other data-processing products.

Newly released, version 2.0 of the SQL-RDBMS-on-Hadoop solution Splice Machine uses Spark as one of two processing engines. Incoming work is divided between them depending on whether it’s an OLTP or OLAP workload.

Splice Machine originally made a name for itself as a replacement for multiterabyte workloads on conventional ACID RDBMS solutions like Oracle. The company claimed it enabled workloads for one former Oracle customer to run an order of magnitude faster, and Hadoop’s native scale-out architecture meant the solution could grow with the size of workloads at a lower cost than with a conventional RDBMS.

Monte Zweben, co-founder and CEO of Splice Machine, stated in an interview that Splice Machine’s big innovation is that it allows OLTP and OLAP workloads to run side by side using the same data and same architecture, but with different processing engines, making it easier to make business decisions with the data.

“We have an architecture that can identify whatever query comes into the system, determine whether it’s OLTP or OLAP, and send the query to the right computational engine,” Zweben said. Transactional queries are run under HBase; OLAP queries are processed via Spark. This also allows memory and CPU usage for each kind of query to be kept segregated.

Adding Spark to Splice Machine may have been been inevitable, as Mike Franklin, one of the company’s advisory board members and a chair in the computer science department at UC Berkeley, is director of AMPLab, where Spark originated.

Spark’s original aim was to provide data scientists with an easy way to perform the kinds of data processing that used to require a lot of code. Spark’s already been used to rewrite IBM’s DataWorks data-transformation product. In Splice Machine’s case, though, it adds entirely new functionality rather than simply enhancing the product.

Spark notwithstanding, Splice Machine faces stiff competition in a field that is growing more crowded by the minute. The database field offers a wealth of possibilities — NoSQL, NewSQL, and in-memory processing — many of which are designed to satisfy extremely specific use cases at high speed. Existing database vendors like MicrosoftOracle, and Postgres are all upping their games to compete with NoSQL and in-memory DB offerings, and Hadoop vendors are spicing up their distributions to satisfy the need for fast analytics results. 

While one of Splice Machine’s selling points is that it allows the reuse of existing ANSI SQL, the compatibility and speed issues with SQL-on-NoSQL solutions will become easier to surmount with time

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author