by Mario Apicella

Data deduplication: Too much of a good thing

analysis
Dec 28, 20063 mins

Quantum joins the crowd of deduplication solutions

Deduplication was without a doubt one of the hot topics in storage last year. It should stay so in 2007. The rationale behind deduplication is that it’s simple to the point of being obvious: Reducing the amount of data moved from Point A to Point B will improve performance and reduce the capacity needed during backups or other copy activities.

There are several different data deduplication approaches from various vendors, but the common ground is identifying parcels of data with the same content and replacing all the duplicates with references to a single instance.

How much can deduplication shrink your data? It depends on the method used and the data content, but you could see a reduction from 10 to 1 or as much as 50 to 1, I am told.

No wonder so many vendors are dancing to the data-deduplication tune: No other technology comes even close to cutting down the amount of required storage. A backup solution that includes some form of deduplication will outperform a traditional approach many times.

Shrinking your data by 10 times means that the connection to your remote office has become suddenly 10 times faster — or you can replace it with a much less expensive link, depending on which way your strategy leans. Regardless, you may be able to resuscitate some projects, such as consolidating backups at the head office that bandwidth constraints previously bumped to the waste bin.

Until now, customers’ only option to store more data was to increase capacity by buying devices with more and larger drives. Deduplication offers the opportunity to reverse or at least slow down that trend and squeeze more data in the same capacity. It’s the storage equivalent of having your cake and eating it, too.

With that in mind, it’s easy to understand the mad rush of vendors such as EMC and Quantum buying rival companies (Avamar and ADIC, respectively) that hold deduplication technologies.

We don’t know yet how EMC will serve up the Avamar dish to their customers, but in December Quantum revealed a couple of new products that take advantage of ADIC’s Rocksoft deduplication and other technologies. (As you may remember, ADIC had already initiated the acquisition of Oz-based Rocksoft and its dedup technology when Quantum began its courtship.)

The two units from Quantum, the DXi 3500 and DXi 5500, should start shipping in January. Both combine what’s essentially a VTL (virtual tape library) appliance and a deduplication engine.

The DXi3500 is a compact 2U unit suitable for small-capacity installations up to 1.5 TB. The DXi5500 is a larger unit that can store up to 11TB of data, but is also faster with a transfer rate up to 800 GB/hour, versus the DXi3500’s 290 GB/hour, Quantum says.

How much will Quantum data dedup duo cost? I don’t have figures for all eight DXi models, but Quantum mentions a retail price of $24,000 for a 3500 with 1.5 TB. The benefits of deduplication, however compelling, apparently don’t come cheap.

Perhaps another and more worrisome concern is that there is no easy migration from one deduplication system to another. Quantum’s dedup systems join numerous other shipping solutions including those from Data Domain, Diligent, Exagrid, and Sepaton that deliver similar data shrinking ratios, but do not interoperate. How do you spell “vendor lock-in”?

Join me on The Storage Network with questions or comments.