The untapped potential of thin provisioning

'Thin provisioning' became a big storage buzzword a few years back, but how many of us actively use it now?

Not all that long ago, thin provisioning became a feature that storage vendors were tripping over themselves to offer in their products. Since then, it has become a de facto standard that you’ll find in just about any virtualized storage array worth its salt.

Unfortunately, the huge potential of thin provisioning is often overshadowed by the complications of using it in production. That’s a shame.

At its simplest, thin provisioning grants more storage to a given consumer than you are actually reserving or allocating. This can be done on a SAN, but you’ll also see it in various virtualization hypervisors, and you can potentially do both at the same time.

Let’s say you have a virtualized file server that you’ve configured with a thin-provisioned 500GB virtual disk. That 500GB virtual disk is in turn sitting on a 1TB SAN volume that has also been thin provisioned on the storage array. The theory is that you’ll merely use as much space on your SAN as has actually been written into that 500GB virtual disk. If there’s 250GB of data on the file server, then only that much needs to be consumed on the SAN — significantly better than having that entire 1TB tied up.

It’s easy to see how thin provisioning can save you a bundle just by increasing utilization of expensive SAN storage. Instead of leaving that remaining 750GB of capacity stranded and unused, other consumers can make use of it — or you could avoid purchasing in the first place.

Of course, the obvious danger here is that your storage needs may increase unexpectedly and you might not have the “real” storage space available on the SAN to satisfy the need. You can almost imagine thin provisioning as charging storage on your credit card — if you don’t have the cash when the bill comes due, you’ll be in a real bind. If you opt to use thin provisioning in this way, you’ll need to be vigilant about monitoring available capacity and making more available ahead of when it’s needed.

In the real world, it turns out that most people don’t take advantage of thin provisioning for just this reason: It requires time to manage and the consequences of failing to do so properly can be dire. Most SANs will unceremoniously shift a volume to read-only mode or, in some cases, actually offline thin-provisioned volumes when space to accommodate more writes runs out. To say that this scenario would make for a bad Monday morning is a vast understatement. Your only options would be to scramble to add more SAN capacity or delete stuff to make space, neither of which is very palatable.

Even if you’ve set aside enough storage on the SAN to create a healthy buffer for growth, other details may ruin the effectiveness of thin provisioning over the long haul. For example, because SAN storage is block-level storage, the SAN has no idea what data it is actually holding. All it knows is that 250GB of raw data have been written in a 1TB thin-provisioned volume it happens to be managing.

If you copy 5GB of data onto that server’s file system, the SAN goes ahead and allocates that space and stores the data as it is written — moving the consumed storage bar up to 255GB. No surprise there. But if you then get on the file server and delete that 5GB of data, the SAN has no idea this has happened and keeps it allocated. This is due to the fact that the file server hasn’t actually zeroed the data out on the disk — it has just marked those blocks of disk as available to be overwritten in its file table.

Worse, many file systems won’t advantageously overwrite deleted data blocks, instead opting to use “fresh” space on the disk. If our file server sees a lot of data turnover with many writes and subsequent deletes (maybe an archiving system is being used), its thin-provisioned disk will quickly grow to the full 500GB even though the file system may still appear to have 250GB of free space. Even if you use a tool such as Microsoft’s Sysinternals SDelete to overwrite all unallocated space with zeros, many SANs aren’t smart enough to know that they can free the data and will instead allocate the entire disk.

To compound the situation even further, let’s say that our virtualized file server is being backed up by virtual machine-aware software (such as vRanger, Veeam, or esXpress). All of these tools make use of virtual machine snapshots to isolate the virtual machine’s disk so that a consistent copy of it can be made. When you create a virtual machine snapshot, you’re essentially telling the hypervisor to shunt subsequent disk writes into a separate snapshot file on the SAN volume. When you delete the snapshot, the hypervisor copies that snapshot data back into the main disk and deletes the snapshot file.

If the backup jobs take a significant amount of time to run, there may be enough turnover on the volumes during the backups that the snapshot file will grow substantially before the backup job has completed and it gets deleted. From the SAN’s perspective, all of that space required to store that snapshot also needs to get allocated and generally won’t be freed afterward. Over time, our thin-provisioned 500GB volume with only 250GB of data in it could actually have caused well more than 500GB be allocated on the SAN.

This isn’t terribly better than just skipping thin provisioning completely and fully allocating things from the get-go. If it’s going to require that much more time to manage properly and doesn’t really save you that much, what’s the point?

Exactly — today, there generally isn’t much point in using thin provisioning on production servers in most SAN environments. Given the huge potential to save big bucks on storage hardware if it all worked properly, this is a terrible waste. But fortunately, there’s also some light at the end of the tunnel.

Some storage vendors — HP 3Par, for example — have gotten wise to thin provisioning’s problems and invested in making thin provisioning work the way it’s supposed to. Through a combination of hardware-based zero detection, quick VM-level scripting, and integration with VMware’s vSphere vStorage APIs, thin-provisioned volumes can stay that way throughout their lifecycle and actually deliver on the promise of reduced storage requirements. I expect other vendors will add this capability.

No matter what kind of environment you run, if you’re in the market for a SAN, don’t overlook what effective thin provisioning can do for you and your budget. Ask your storage vendors (and their references) how well their thin provisioning works in real life. If you find a good fit with a product that has effective thin provisioning, you may be able to slash what you spend on storage to a fraction of what it would be otherwise.

This article, “The untapped potential of thin provisioning,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Technology Industry

Topics

About

Policies

Our Network

More

The untapped potential of thin provisioning

'Thin provisioning' became a big storage buzzword a few years back, but how many of us actively use it now?

More from this author

The InfoWorld guide to disaster recovery done right

How to survive the data explosion

Review: Dell Compellent storage delivers flash speeds at disk prices

5 must-have capabilities for every monitoring system

The secret to troubleshooting performance problems

Cloud audits often don’t mean what you think they do

Where cloud backup fits the bill

What you need to know about today’s SSDs

Show me more

Claude Code AI tool getting auto mode

PyPI warns developers after LiteLLM malware found stealing cloud and CI/CD credentials

Cloudflare launches Dynamic Workers for AI agent execution

How to build desktop apps in Typescript with Electrobun

Write and run assembly in Python with Copapy

Run AI Models Locally on Your PC — No Cloud Required (LM Studio Guide)