Serdar Yegulalp
Senior Writer

Cloudera’s Kudu: Like HDFS and HBase in one

news analysis
Sep 28, 20153 mins

With Kudu, Cloudera promises to solve Hadoop's infamous storage problem

storage warehouse
Credit: Frank Hebbert

Storing data in Hadoop generally means a choice between HDFS and Apache HBase. The former is great for high-speed writes and scans; the latter is ideal for random-access queries — but you can’t get both behaviors at once.

Hadoop vendor Cloudera is preparing its own Apache-licensed Hadoop storage engine: Kudu is said to combine the best of both HDFS and HBase in a single package and could make Hadoop into a general-purpose data store with uses far beyond analytics.

Fast writes, fast updates, fast reads, fast everything

Kudu was created as a direct reflection of the applications customers are trying to build in Hadoop, according to Cloudera’s director of product marketing, Matt Brandwein.

These applications are typically constructed by organizations that want to “integrate data quickly, data that changes, and access that data very quickly for analytics … The problem is, today, there isn’t a good storage back end for them to do that.”

HDFS allows for fast writes and scans, but updates are slow and cumbersome; HBase is fast for updates and inserts, but “bad for analytics,” said Brandwein.

Kudu is meant to do both well. Written in C++ rather than Java, it uses its own file format and was “built from the ground up to leverage modern hardware.” Rather than bounce back and forth between HDFS or HBase, applications can use Kudu as a single unified data store. (Integration for Spark and Cloudera’s Impala are planned too.)

Though Cloudera is behind the project, Brandwein made it clear there is “nothing Cloudera-specific about [Kudu].” The project is intended to be released as open source and eventually put under the governance of the Apache Software Foundation, in the same manner as Hadoop’s other major components.

Replacement or enhancement?

If all this sounds like a straight-up replacement for HDFS or HBase, Brandwein noted that wasn’t the immediate intention. Instead, Kudu is meant to complement and run side by side with the storage engine because some applications may get more immediate benefit out of HDFS or HBase.

Last week, before the official release of the news, VentureBeat speculated about Kudu’s possible implications for the rest of the big data industry. It “could present a new threat to data warehouses from Teradata and IBM’s PureData … It may also be used as a highly scalable in-memory database that can handle massively parallel processing (MPP) workloads, not unlike HP’s Vertica and VoltDB.”

This isn’t likely to happen overnight, in the same way Kudu isn’t likely to become a rip-and-replace substitute for HDFS or HBase. Teradata, in particular, decided it was better to have Hadoop as an ally — it entered into partnerships with Hortonworks and added Hadoop support for many of its appliances.

Data warehouses still have markedly different needs and applications than Hadoop, so the two benefit when they work together rather than when one tries to subsume the other. Kudu will need time to come out of beta and provide a compelling use case for switching production systems, but it’ll take more time for the existing data warehouse market to feel a genuine existential crisis.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author