Serdar Yegulalp
Senior Writer

Mesosphere’s new big data solution: Add Spark, hold the Hadoop

news analysis
Aug 20, 20153 mins

A data-processing solution from Mesosphere leverages Spark, Kafka, and Cassandra -- but eschews Hadoop -- for enterprise level real-time big-data needs

big data lessons
Credit: Thinkstock

Mention big-data tools like Spark and Kafka to most enterprise users, and the other big-data tool that comes to mind along with them is Hadoop. But does it need to?

Mesosphere, corporate backers of the Apache Mesos cluster-management project, are ginning up a big-data stack that eschews Hadoop, but embraces Spark (and Kafka, and Cassandra, and the Akka event framework) for real-time processing.

Mesosphere Infinity is “a turnkey, full-stack offering optimized for big data and IoT,” and its main aim is to provide an easily erected stack for businesses for real-time data work. It also stands as a recent example of how many of the technologies reflexively associated with the Hadoop stack don’t require Hadoop to be useful.

Look, ma, no Hadoop

Matt Trifiro, chief marketing officer for Mesosphere, explained in a phone conversation how Infinity is managed by another Mesosphere creation: Mesosphere DCOS, which allows entire data centers full of applications to be stood up easily. Infinity, in turn, is for managing a relatively narrow range of applications: Spark for data processing; Kafka for real-time data ingestion; and another Apache Foundation project, Cassandra, for data storage.

While Infinity “doesn’t exclude Hadoop,” said Trifiro, “it doesn’t require it, either. You can use [Hadoop’s] HDFS as a persistent data store, and you may have Hadoop processing over data pushed into Cassandra, but in terms of real-time acquisition, you need a specialized stack.”

Sparks of inspiration

Spark has drawn attention as of late from a roster of A-list technology firms interested in both investing in the project and leveraging it for heavy-duty business analytics work. Still, like many other open source data tools, Spark is by itself far more “project” than “product” — it isn’t a trivial effort to use in an enterprise environment.

Trifiro claims Spark and the rest of the Infinity stack “was built from observation of what people were putting into production.” Businesses were attempting to put together Spark and Kafka stacks for real-time analysis, said Trifiro, because “the demand for processing real-time data by non-Web companies is relatively new, and there’s immense pressure on IT teams to do this.” Standing up an entire such stack has “historically required a lot of expertise,” and Infinity is meant to require minimal work to get up and running.

Mesosphere plans to make Infinity’s stack even easier to consume by offering it via existing cloud services. Right now, though, the only named partner for cloud-based enterprise distribution is Cisco, the same company that worked hand-in-hand with Mesosphere to build Infinity.

One possible analogy is with running applications in containers, versus using virtualization and OpenStack. Containers offer a potentially more precise solution to the problems of running applications at scale than VMs did. Likewise, Spark alone, as opposed to Spark plus Hadoop, might present a better fit for the data-processing problems faced by enterprises — as long as deployment and management of a Spark stack doesn’t put them back at square one.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author