Serdar Yegulalp
Senior Writer

Hortonworks buys better Hadoop data flow management

news analysis
Aug 25, 20152 mins

Hortonworks' newest acquisition is a prelude to creating an open-source-based data flow management product

arrow diagram flow chart
Credit: Thinkstock

Hadoop vendor Hortonworks, fresh off releasing a new version of its distribution, has acquired a company with a framework Hortonworks wants for handling how data moves into, out of, and next to Hadoop.

The company is Onyara, and the framework (of which Onyara is a commercial supporter) is the Apache NiFi project, a system for graphically diagramming how data can move through a system.

Hortonworks sees NiFi as a way to create a new data platform for Hadoop that deals with data gathered in and acted on in real time from a panoply of devices, “smart” and otherwise. Originally a product of the NSA, NiFi was open-sourced under the agency’s Technology Transfer Program, the same declassification effort that provided the SIMP cyber security tool.

Apache NiFi flow Apache Foundation

An example data flow created in Apache NiFi.

Rather than trying to build the functionality into Hadoop directly, Hortonworks is creating a parallel product offering, Hortonworks DataFlow (not to be confused with Google’s product of the same name). DataFlow will be sold to enterprises looking for a solution to handle data in motion as well as data at rest.

NiFi is also meant to play well with all the other stars of the Hadoop cast, like Spark (for real-time data processing) and Kafka (messaging). Plans are on the table for integrating NiFi-controlled flows into Hortonworks’s existing Data Governance Initiative as well, so DataFlow-controlled data can be labeled and tagged even apart from Hadoop itself.

Adding NiFi to the Hortonworks mix complements Hortonworks’s central mission, which is to provide Hadoop and related products without proprietary encumbrances. But all signs point to pure open source plays of any kind as increasingly tough sledding.

Hortonworks’ recent financial news has been mixed, with net losses up despite a growing customer base and increasing quarterly revenue. DataFlow comes off as an attempt to give the company a new revenue stream by leveraging existing customers instead of adding new ones. With the size of the market for commercial Hadoop offerings in question, the former approach seems smarter.

Serdar Yegulalp

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.

More from this author