by Serdar Yegulalp

Senior Writer

Big data brawlers: 4 challengers to Spark

news analysis

Aug 1, 20163 mins

Spark isn't the only option for handling big data at scale and in memory. These four contenders take on both stream processing and batch jobs

Big (and even not so big) data hasn’t been the same since Apache Spark made inroads with developers and became a staple ingredient in big data clouds.

But Spark is far from perfect. It’s certainly improving, as version 2.0 shows, but if a competitor offers a better handle on what Spark does and more, developers will pay attention.

Here are four projects emerging as possible competition for Spark, with new approaches to handling the conventional in-memory batch processing Spark is famous for and the streaming Spark continues to work on.

Apache Apex

What it is: Originally created by DataTorrent, Apex has since been donated to the Apache Foundation. It performs both stream and batch processing on Hadoop under YARN.

How it competes: Apex’s streaming is the real deal, while Spark’s “streaming” is actually a microbatch system. It also has native support for fault-tolerance by way of Hadoop — though that means Apex and Hadoop are tightly coupled. Spark can work with or without Hadoop, and Apex doesn’t yet have Spark’s machine learning features.

Heron

What it is: Twitter’s replacement for the Apache Storm stream-processing framework, Heron is now available as an open source project. Consider this a contender for Spark streaming.

How it competes: Heron runs streaming jobs via containers managed through a scheduler. To that end, it not only scales more readily than other solutions, but is easier to debug, deploy, and keep running well on clusters. It’s also designed to appeal to existing Storm users, since it’s compatible with the Storm API and shares many of Storm’s concepts (such as “spouts” and “bolts”).

Apache Flink

What it is: Apache Flink is a stream-processing library that competes with Apache Storm as much as Spark.

How it competes: Like Apex, Flink puts streaming first, and it uses a true streaming model rather than Spark’s streaming via microbatch. Flink also has provisions for performing iterative or repeated processing on streams, and it includes Spark-like features, such as machine learning and graph processing libraries. But Flink is still a relatively new project, having hit 1.0 earlier this year.

Onyx

What it is: Onyx is a “masterless, cloud scale, fault tolerant, high performance distributed computation system,” according to its documentation, with both batch and stream processing models.

How it competes: Written in the functional language Clojure rather than Scala, Onyx puts streaming first — batch operations are basically implemented as ministreams. Onyx also allows the developer to use language constructs in either Clojure or Java, such as Clojure’s vectors and maps, to define how data is processed. (Many of Onyx’s goals were laid down before the code was even created.) If Onyx catches on, it’ll most likely be due to Java’s existing popularity rather than Clojure’s expressiveness.

Data ManagementAnalytics

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

Show me more

Topics

About

Policies

Our Network

More

Big data brawlers: 4 challengers to Spark

Spark isn't the only option for handling big data at scale and in memory. These four contenders take on both stream processing and batch jobs

Apache Apex

Heron

Apache Flink

Onyx

More from this author

Speed boost your Python programs with new lazy imports

I ran Qwen3.5 locally instead of Claude Code. Here’s what happened.

Migrating Python to Rust with Claude: What could go wrong?

First look: Electrobun for TypeScript-powered desktop apps

What I learned using Claude Sonnet to migrate Python to Rust

The best new features in MariaDB

Python’s popularity slip: Here’s what we know

What is Docker? The spark for the container revolution

Show me more

Google targets AI inference bottlenecks with TurboQuant

Basic and advanced Java serialization

Swift 6.3 boosts C interoperability, Android SDK

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)