Hadoop’s growth opens up demand for data migration tools

news analysis

Oct 27, 20143 mins

As more companies adopt Hadoop, they need help getting their data onto the platform -- and a new field is born

Hadoop’s explosion over the last few years has been phenomenal. One estimate puts its growth at nearly 60 percent year-over-year, with a market of $50 billion by 2020. As the furious uptake has created demand for Hadoop vendors, an accompanying need for vendors selling Hadoop data migration tools and services is also shaping up.

In theory, getting data into and out of Hadoop is well within the capacity of both the software and its users. Apache’s Sqoop project was created to deal with Hadoop import and export, with native support for the usual suspects: MySQL, Oracle, PostgreSQL, and HSQLDB. But not everyone is comfortable doing the work themselves, so vendors are offering polished import/export solutions that require less manual labor.

Companies with data migration solutions for other, pre-existing platforms are a natural for this space. For example, Attunity, maker of a variety of data-movement solutions, has Attunity Replicate, which also handles many data sources and targets other than Hadoop — such as Oracle, SQL Server, DB2, and Teradata. Attunity offers optimizations specifically for transfers over wide-area networks, clearly intended to appeal to those attempting to migrate mult-terabyte jobs off-premises.

In the same vein, Diyotta DataMover also supports Hadoop as either a source or a target, with an equally large roster of data formats and repositories.

Syncsort specifically targets mainframes, working in conjunction with Cloudera to create a system that harvests data directly from existing mainframes and loads it into Hadoop. Syncsort CEO Lonne Jaffe describes it as “a button you can push to suck in the expensive workloads.”

With these offerings, the main attraction isn’t the number of supported data sources, but rather the convenience and the expertise-in-a-box approach. Hadoop vendors like Hortonworks compete by offering their own support and migration services, so there may be less incentive on their part to make Sqoop into a full-blown replacement for third-party products.

One more detail vital for any Hadoop data migration product is future-proofing — specifically, being able to work well with the changes coming down the pike for Hadoop. This is bigger than ditching MapReduce for YARN, but needs to include support for the likes of Apache Argus, Hadoop’s forthcoming data security framework.

The best long-term investment for dealing with Hadoop data migrations may be in understanding the existing toolsets and making the most of them. You might not want to roll your own Sqoop import connector for a mission-critical job, but the work could pay off in the long run for a future inward migration — or if an option even bigger and more ambitious than Hadoop comes along.

Data Management

by Serdar Yegulalp

Senior Writer

Follow Serdar Yegulalp on X

Serdar Yegulalp is a senior writer at InfoWorld. A veteran technology journalist, Serdar has been writing about computers, operating systems, databases, programming, and other information technology topics for 30 years. Before joining InfoWorld in 2013, Serdar wrote for Windows Magazine, InformationWeek, Byte, and a slew of other publications. At InfoWorld, Serdar has covered software development, devops, containerization, machine learning, and artificial intelligence, winning several B2B journalism awards including a 2024 Neal Award and a 2025 Azbee Award for best instructional content and best how-to article, respectively. He currently focuses on software development tools and technologies and major programming languages including Python, Rust, Go, Zig, and Wasm. Tune into his weekly Dev with Serdar videos for programming tips and techniques and close looks at programming libraries and tools.