Serdar Yegulalp
Senior Writer

Why Spark is spiking in the cloud

news analysis
Jul 22, 20153 mins

Interest and investment in Apache Spark have increased dramatically in recent months, to the benefit of cloud customers

In the last month, several A-list names in cloud and business computing have declared interest (and made investments) in the Apache Spark data analysis project. What got them fired up?

Some of this is legitimate excitement over a promising technology with broad applications. But it’s also about yet another project that can be monetized in the cloud, by wrapping it in convenience and offering it at scale.

The allure of Spark

Among the companies in recent months expressing their devotion to Spark:

  • IBM. Aside from adding Spark support to its Bluemix PaaS, IBM is also preparing to contribute its SystemML machine learning algorithm construction technology to Spark.
  • Microsoft. Adding Spark support to Azure HDInsight (its cloud-hosted version of Hadoop).
  • Amazon. Its Elastic MapReduce service will be able to run Spark apps developed not only in Scala, but also Python and Java.
  • Huawei. The Chinese networking giant recently unveiled a project called Astro that combines Spark, Spark SQL, and HBase into a single product. Spark is already used in Huawei’s Hadoop-based FusionInsight product, offered as a service by way of Huawei’s burgeoning cloud platform.

Spark is attractive mainly because it provides a powerful in-memory data-processing component within Hadoop that deals with both real-time and batch events. At Yahoo, where Hadoop originally sprung up, Spark has become a cornerstone in analytics operations.

For the above companies, Spark offers a grade-A ingredient for their cloud business, both with and without Hadoop (although typically with). With prices in a constant race to the bottom, competition between cloud vendors revolves around offering features formerly confined to the data center, but at a scale and with a degree of convenience unavailable there. (The fact that we’re now at a phase where more enterprise data is being generated in the cloud rather than moved there also helps.)

Lighting the next fire

Where Spark goes from here is also crucial, since many of the future directions discussed for the project have potential effects on how Spark can be deployed as a cloud resource.

IBM’s contributions to Spark are in that vein. Databricks, corporate developer of Spark, has plans of its own that could have even more radical effects. Its Project Tungsten constitutes a major revamp of the way Spark leverages memory allocation for the sake of boosting performance. This wouldn’t benefit only Spark developers, but all those providing Spark as a service.

Ironically, the more popular Spark is in the cloud, the more directly it might threaten the business model of Databricks itself. InfoWorld’s Andy Oliver profiled Databricks’ Spark offering — a sort of interactive data notebook for Spark — and found it to be “far from the Tableau of data science” it seems intended to be. The other big contenders listed above may not have the same degree of interactivity for their Spark offerings, but they’re arguably put together in ways that more directly complement actual Spark workloads.

Spark also needs to mature in other ways — documentation, commercial support, and middleware integration, as well as having more pre-Spark apps written for use. Barring the last item, the vast majority of those are jobs well-suited to Spark’s corporate contributors and sponsors — unless, that is, their contributions end up being little more than ensuring Spark runs well in their cloud and for their customers.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author