Serdar Yegulalp
Senior Writer

Hortonworks’ Hadoop now launches from Google Cloud Platform

news analysis
Jan 23, 20153 mins

Support for Hadoop includes the HDP distribution and connections to Google's other data-crunching services

Hadoop elephant code
Credit: Thinkstock

Fans of Hortonworks have a new cloud venue in which to deploy and run the company’s edition of Hadoop: the Google Cloud Platform.

In joint announcements, Hortonworks and Google revealed that the Hortonworks Data Platform (HDP) distribution of Hadoop is now available and fully certified on Google’s cloud.

This isn’t the first time Hadoop has been available on the Google Cloud Platform. Previously, Google provided setup scripts and software libraries for Apache Hadoop — but only for the core version of Hadoop, without add-ons or third-party refinements. But now HDP can be deployed on Google Cloud Platform with the same command-line tools already used for setting up Hadoop — tools produced in collaboration between the two companies.

Version 2.2 of HDP, which has been approved for Google Cloud Platform, was released last October and boasts a bevy of features designed to leverage the latest developments in Hadoop. Among them is closer integration with the Spark in-memory processing framework, new processing engines such as Apache Kafka for analyzing data in real time, and many other changes.

The collaboration between Google and Hortonworks also allows other Google Cloud Platform features to be leveraged through HDP. For instance, Google Cloud Storage — Google’s object storage system — has an HDP connector that allows analyses to be run on data in Cloud Storage without being copied to an HDFS volume. Google’s BigQuery data analytics platform also sports an HDP connector — one apparently designed to entice Hadoop users, since it can perform certain kinds of processing with less work.

In theory, linking BigQuery and HDP should provide Google Cloud Platform customers with a migration path between the two, but it’s more likely that BigQuery will be used for processing selected jobs rather than replacing Hadoop entirely — if only because of the far larger and more established audience for Hadoop. Dataflow, another Google Cloud offering in the same vein, superficially resembles a Hadoop replacement but is designed more as a competitor for a specific Hadoop component, Spark.

BigQuery might also have an edge over Hadoop in cost, depending on the workload — and provided the BigQuery feature set is all people need. With Hadoop, costs are incurred for both storage and Google Compute Engine. Connections to BigQuery are billed at that service’s rate; processing incurs a charge of $5 per terabyte and storage costs 2 cents per gigabyte per month, but loading or exporting data into the system, as well as simple table reads, cost nothing.

It wouldn’t be wise to rule out the possibility that Hadoop users on Google Cloud Platform may gravitate more toward Google’s tools in the long run. Having more Hadoop jobs hosted in the cloud, data and all, may spur interest in Google’s data analytics — provided Hortonworks’ offerings don’t overshadow them or prove to be the better value.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author