Serdar Yegulalp
Senior Writer

Apache Storm 1.0 packs a punch

news analysis
Apr 13, 20162 mins

Apache's streaming data processing system takes on Spark with better performance and more convenient debugging features

When big data mavens debate the merits of using Apache Spark versus Apache Storm for streaming data processing, the argument usually sounds like this: Sure, Storm has great scale and speed, but it’s hard to use. Plus, it’s slowly being overtaken by Spark, so why go with old and busted when there’s new and hot?

That’s why Apache Storm 1.0 hopes to turn the ship around, not only by making it faster but by also easier and more convenient to work with.

Apache announced this week that Apache Storm 1.0 can crank out results “up to 16 times faster” than before, with a 60 percent reduction in latency. “For most use cases users can expect a 3×  performance boost over earlier versions.”

A collection of strategic fixes provide the performance boosts, among them a new distributed cache API that enables data associated with a given Storm setup, or “topology” — which can run to many gigabytes — to be shared between nodes and updated from the command line; it doesn’t have to be redeployed by hand to each node. The data can be drawn from the local filesystem, but if it is stashed in an Hadoop HDFS store — a good place to put it — it can be drawn from there as well.

A new batching methodology also provides a major speed boost — one micro-benchmark increased fivefold — with only a very slight increase in latency.

Many of the other changes in version 1.0 will help Storm be easier to work with. Debugging earlier releases of Storm typically involved writing custom “bolts” (processing functions) to extract live data. With version 1.0, users can sample a percentage of data moving through Storm, which can be viewed in the UI or saved to disk for later inspection. Likewise, a new log-search function lets the user search logs across the entire topology of Storm supervisor nodes.

Storm faces competition from more than Spark alone, both in terms of performance and ease of use. The Project Apex streaming framework, also known as DataTorrent RTS, is meant to be “10 to 100 times faster” than Spark Streaming, and is easier to develop with and deploy than either Spark or Storm.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author