matt_prigge
Contributing Editor

What to ask when choosing a new storage platform

analysis
Apr 2, 20128 mins

If you want to know what to buy, talk to people who've adopted the technology you're eyeing. Here are key questions to ask

No one likes to rip and replace. But sooner or later, you need to buy new stuff to upgrade or augment existing your storage infrastructure — and each time you need to go to school on your options as if you’d never seen a SAN before. That’s how fast the technology changes.

Those who buy new storage products based solely on feature lists, quick tabletop demos, and cost comparisons take a huge gamble. Though it’s impossible to anticipate every single gotcha, the trick to avoiding a disastrous choice is in knowing the right questions to ask — and where to get the answers.

Ultimately, the most valuable sources of information are customers who’ve already implemented the technology in question. Here are some questions you can ask to quickly get to the heart of what you need to know.

It’s all about the software The basic building blocks that form the storage hardware you’re likely to consider buying today have largely reached commodity status. There’s very little variation in the actual disk or disk interface hardware used across the market — one vendor’s 6Gbps 15K SAS drive is going to be very similar if not identical to another’s. It’s also increasingly common to see controller hardware that’s based around the same general-purpose Intel and AMD processors you’ll find in servers rather than the highly customized processors common in previous storage generations.

The wide availability of fast disks and high-performance server hardware have made it possible for even the greenest storage startup to slap together a Linux or BSD-based storage array that will compete well with the big names in terms of raw disk performance. Essentially, just about anyone can toss together a bunch of off-the-shelf SLC SSDs, 15K SAS disks, and a multicore Intel mainboard, then field a storage array with serious performance potential.

The real differentiation between storage options available in the marketplace today is almost entirely derived from the software running on those storage processors. That’s where you see the real game changers implemented, such as snapshots, thin provisioning, deduplication, and automated tiering. The quality and reliability of those software features are what separate the wheat from the chaff.

Unfortunately, it’s nearly impossible to get an impression of that software without actually using it for a long time. While it’s true you can learn a lot from a tabletop demo, you’ll mostly come away with impressions of the user interface’s assembly and ease-of-use — not of what really happens when you tick the boxes and start making changes. These are usually the details you’ll want to learn the most about as you chat with references.

Snapshots When storage array-based snapshot tech started to show up on the scene, there was a massive amount of variation between vendors’ implementations. Today, most storage platforms use a virtualized approach to mapping a presented storage LUN to the underlying physical disk where that LUN is actually stored — a methodology that lends itself to easy creation of point-in-time snapshots and cloning/rollback of those snapshots. Despite the fact that snapshot tech is sort of old hat these days, you’ll still find striking variations among vendor implementations.

These variations are largely based on a trade-off between the capacity efficiency of the snapshots and the amount of processing overhead required to track them. For example, a given array may be able to create and track an enormous number of snapshots with almost no perceptible impact on storage performance, but the block-level disk changes that comprise those snapshots may be tracked in very large blocks and, as a result, consume a very large amount of storage space.

Despite their near ubiquity in enterprise storage platforms, it’s important to dig into the nitty-gritty of how snapshots work on the platform you’re considering. By all means, use them — they can be an enormous benefit when leveraged properly. Does utilizing snapshots have an impact on storage performance? How efficient are the snapshots when applied to slowly changing data? How about for workloads with large amounts of very small, random writes such as databases and mail servers?

Thin provisioning Thin provisioning is another feature made possible in large part by storage virtualization. Instead of committing the entire capacity of a volume, the storage hardware can track which blocks of a volume haven’t been used yet and keep them free for other volumes — essentially allowing you to overcommit your storage resources.

The holy grail of thin provisioning is to have the amount of usable storage on your storage array exactly match the amount of actual data you are storing on your servers; thus, a 40GB NTFS volume with 12GB of data in it consumes only 12GB on your SAN. Though that seems straightforward, in practice, it’s anything but. The tricky bit comes when you create a file within that disk, then delete it. Is the storage array aware that’s happened? Can it free that space afterward? Do you need to run a manual process to let that to happen? If so, how easy is it to stay on top of?

Automated tiering As organizations large and small deal with data that’s growing at enormous rates, the ability to effectively leverage the enormous capacities present in low-performance SATA and NL-SAS disks has become very important. The trouble is that while you may be able to buy a shelf of 3TB disks and present an enormous of amount of capacity very cheaply, the relatively low performance of those disks makes them a bad choice for many transactional workloads such as databases. While you can mix in higher-performance SAS/FC disks or even SSDs for more demanding workloads, those resources come at a much higher cost. Getting the most out of both high-capacity, low-performance and low-capacity, high-performance resources without requiring a lot of management oversight is increasingly important.

That’s where automated tiering comes in. In a perfect world, the storage platform software senses which data, down to the block level, is being used regularly and keeps it on faster disks. Meanwhile, data that’s used less frequently is migrated to slower disk. Ideally, all of this takes place without requiring the storage admin to do anything other than monitor the capacities and load averages on the various tiers and add resources to each as they’re required.

As you can imagine, a poorly constructed tiering algorithm can make this kind of feature a curse or a godsend. Getting a good feel for how existing customers are using those features and how well they work is critical if you plan to rely on them heavily. How capacity-efficient is the tiering? Can I imagine that a couple of 200GB SSDs are an extension of my storage array’s cache, or will I need to deploy a lot of SSDs to have an effective first storage tier? How quickly does the tiering algorithm react to changing workload characteristics? Is data migrated on a second-by-second basis, or are longer load averages used to determine where data lives?

Finding references After you know what questions you want to ask, the next step is finding the people you want to put them to. Most storage vendors will be happy to arrange a reference call with an existing customer, but you’ll often find that these references are biased to some extent. Conversely, throwing questions out onto an online forum will usually serve to expose the deep-seated, almost religious convictions some people have about the storage vendor you might be considering — and you have no way of knowing if those folks have used the hardware you’re looking at.

Instead, direct word of mouth is usually the best way to get useful, unbiased information. If you know someone personally, that’s great. If you don’t, consider using the opportunities presented by storage conventions to track down existing users and see what they have to say. I’ve often found that, despite not being a storage convention, VMworld and various VMUG meetings can be excellent places to do this as people who leverage server virtualization almost always have a SAN fielded right next to it.

No matter what you do or where you get your information, don’t settle for a quick demo and a stack of marketing glossies when you make buying decisions about primary storage. If you find people who use the hardware and learn what they liked and what left them disappointed, you’ll be much more confident in your decision and happier with the results.

This article, “What to ask when choosing a new storage platform,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.