Apache Storm 1.0 packs a punch

news analysis

Apr 13, 20162 mins

Apache's streaming data processing system takes on Spark with better performance and more convenient debugging features

When big data mavens debate the merits of using Apache Spark versus Apache Storm for streaming data processing, the argument usually sounds like this: Sure, Storm has great scale and speed, but it’s hard to use. Plus, it’s slowly being overtaken by Spark, so why go with old and busted when there’s new and hot?

That’s why Apache Storm 1.0 hopes to turn the ship around, not only by making it faster but by also easier and more convenient to work with.

Apache announced this week that Apache Storm 1.0 can crank out results “up to 16 times faster” than before, with a 60 percent reduction in latency. “For most use cases users can expect a 3× performance boost over earlier versions.”

A collection of strategic fixes provide the performance boosts, among them a new distributed cache API that enables data associated with a given Storm setup, or “topology” — which can run to many gigabytes — to be shared between nodes and updated from the command line; it doesn’t have to be redeployed by hand to each node. The data can be drawn from the local filesystem, but if it is stashed in an Hadoop HDFS store — a good place to put it — it can be drawn from there as well.

A new batching methodology also provides a major speed boost — one micro-benchmark increased fivefold — with only a very slight increase in latency.

Many of the other changes in version 1.0 will help Storm be easier to work with. Debugging earlier releases of Storm typically involved writing custom “bolts” (processing functions) to extract live data. With version 1.0, users can sample a percentage of data moving through Storm, which can be viewed in the UI or saved to disk for later inspection. Likewise, a new log-search function lets the user search logs across the entire topology of Storm supervisor nodes.

Storm faces competition from more than Spark alone, both in terms of performance and ease of use. The Project Apex streaming framework, also known as DataTorrent RTS, is meant to be “10 to 100 times faster” than Spark Streaming, and is easier to develop with and deploy than either Spark or Storm.

Data ManagementAnalyticsOpen SourceBusiness Intelligence

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

Show me more

Topics

About

Policies

Our Network

More

Apache Storm 1.0 packs a punch

Apache's streaming data processing system takes on Spark with better performance and more convenient debugging features

More from this author

I ran Qwen3.5 locally instead of Claude Code. Here’s what happened.

Migrating Python to Rust with Claude: What could go wrong?

First look: Electrobun for TypeScript-powered desktop apps

What I learned using Claude Sonnet to migrate Python to Rust

The best new features in MariaDB

Python’s popularity slip: Here’s what we know

What is Docker? The spark for the container revolution

First look: Run LLMs locally with LM Studio

Show me more

OpenAI’s desktop superapp: The end of ChatGPT as we know it?

Google’s Stitch UI design tool is now AI-powered

The ‘toggle-away’ efficiencies: Cutting AI costs inside the training loop

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)