matt_prigge
Contributing Editor

A new role for the amazing solid-state drive

analysis
Jun 20, 20115 mins

EMC provides a glimpse of how SSDs will improve enterprise storage performance -- by finding a home on the server itself

Whether you’re checking market share numbers or SPC-1 performance results, it’s clear that solid-state drives (SSDs) are making huge inroads in high-end enterprise storage. As SSDs drop in price and rise in capacity, their superior performance will continue to gradually push spinning disk out of the data center.

But SSDs will ultimately do more than simply replace disk. Recent announcements from EMC and rumors out of Dell’s new EqualLogic/Compellent brain trust suggest a transformation of the enterprise storage landscape. To grasp what’s afoot, it helps to understand the idea of tiered storage that underlies most storage architectures.

Storage tiering has been around for a while. The idea is to deploy multiple classes of storage hardware that correspond to various points on the performance- versus-capacity spectrum and intelligently allocate data to the most appropriate storage hardware class. For example, you might locate bulk data that will probably see sequential I/O on cheap SATA disk, while transactional database data might live on 15,000-rpm SAS disk.

At first, tiering was almost entirely manual. You’d field two or three different types of disk and, largely through trial and error, shuffle your data around among tiers until the right mix of capacity, cost, and performance was achieved. This was not only time consuming — requiring very close monitoring of transactional storage demands — but also not particularly effective, since it was generally done on a volume-by-volume basis rather than block-by-block.

Several storage vendors — 3Par and EMC are great examples — saw an opportunity to make everyone’s lives easier and introduced automated array-based tiering. This took the onus off the storage administrator and allowed the array to make intelligent decisions about where to locate data based on its usage profile. Thus, individual blocks of a database (perhaps comprising a database index) could be automatically shoveled into fast storage, while less-accessed data stayed on much less expensive, slower storage.

As great as automated tiering is, the licensing costs often associated with it and the fact that the gulf between “cheap” and “expensive” spinning disk storage is not particularly huge (with SATA disk performing a little less than half as well as 15,000-rpm SAS) prevented many enterprises from justifying the expense.

That all changed when SSDs hit the market. Yes, SSDs come at a price premium, but their massive transactional performance easily justifies the increase in price when high-transactional demand must be satisfied. Suddenly, enormous performance and cost benefits could be derived by adding SSDs as a top tier above SAS and SATA disk. That’s the main reason automated tiering has become the new must-have feature.

But automated array-based tiering is by no means where the story ends: EMC’s Project Lightning, recently decloaked from stealth mode, locates array-managed flash within servers that are attached to the array.

But wait, isn’t that just caching? Nope! Caching implies that you’re storing a copy of the data that you’ve accessed most recently in locally accessible DRAM (or, increasingly, flash). Writes can be absorbed into that cache, but still eventually need to be flushed to array storage where that data actually lives. Thus, consistently high write workloads or read workloads that exceed the size of the cache are almost inevitably dependent on the latency of the underlying storage.

Moreover, most server-side caching mechanisms aren’t intelligent enough to have a good long-term grasp of what data to keep in cache — instead making short-sighted, second-by-second decisions that you can’t influence externally. Caching is an improvement, but it’s no silver bullet. Likewise, caching has no impact on the amount of array-based storage capacity you actually need to deploy.

Project Lightning promises to deliver a new flashed-based tier that’s actually located within the server itself — moving the I/O workload off the array and simultaneously delivering reduced latency. Since the data that’s moved to that inboard tier is managed by the same tiering software that manages the array (Fully Automated Storage Tiering, in EMC’s case), that inboard flash suddenly becomes a wholesale extension of the array itself — subject to all of the same SLAs and policies you define.

The end result should be decreased storage latency, lower transactional load on the storage array, and lower top-tier array-based capacity requirements. As Chad Sakac — EMC employee and author of the well-known Virtual Geek blog — postulates: “This is likely something everyone is going to need to do (like auto-tiering, like flash-assisted caches, like dedupe, compression, etc…).” I think he’s spot-on.

Whether the product that eventually results from EMC’s Project Lighting ends up being the best (or even the first) to do this is yet to be seen. But if flash storage continues to evolve and get cheaper, it should become a commodity, both within the array (as it has) and within individual servers. Today it might be in the form of PCIe-attached flash boards; tomorrow it might be a staple of server mainboard design just as DIMM memory slots are.

To be sure, there’s a lot of noodling still to be done at EMC and other prospective storage vendors to ensure that data will be sufficiently protected in this decentralized model. But if you’re looking for the next must-have storage feature on the horizon, this is it.

This article, “A new role for the amazing solid-state drive,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.