matt_prigge
Contributing Editor

The high cost of lazy storage practices

analysis
Mar 29, 20105 mins

In the enterprise data explosion, we are our own worst enemy

In my last post, I talked about our increasing dependence on huge, rapidly growing piles of data that must be backed up and duplicated in turn — compounding the data explosion problem. An astute reader posed a question I think all of us have asked in one way or another, but rarely tried to answer: Why is our data growing so rapidly?

It would be easy to respond to that question by trotting out the usual suspects. For example, high-definition technologies such as medical imaging, document imaging, and video certainly eat up lots of space. And who can deny that compliance regulations have forced us to retain more and more documents and messages?

Yet these easy explanations miss an obvious part of the equation: In large part, our data is growing so quickly because it is far, far easier to create data than it is to get rid of it. It takes work to police our data after it is created, and to be blunt, we’ve become too lazy or busy to deal with it.

I will be the first to admit that I am an excellent example of the problem. I am an email pack rat. At the day job, my mailbox is larger than 2GB. My personal account is easily two or three times that size. If everyone was like me — thankfully, most are not — we’d be in serious trouble.

My excuse for this deplorable behavior is that I never know what I’m going to need again. I often find myself in the position of needing to remember what I did three or four years ago — say, digging out a license key that a client has misplaced. If I took a hatchet to my email and ditched everything older than six months, I guarantee that in two weeks or less I’d be without something that I needed. The only realistic solution for me is to go back through my email, reread all of it, and delete the stuff I know won’t have any future significance.

Technology can’t do that for me. All of the archiving, deduplication, and compression in the world might shrink the data and make it easier to search, but it won’t magically get rid of it for me. I have to do that. And you know what? I’m not going to. Because it would take me a massive amount of time — that I have too little of — to do accurately. The few hundred dollars’ total cost of ownership of having that data sitting on a server that’s attached to a SAN somewhere is simply not worth the time it would take me to free it up.

And there’s the rub: The bigger that data gets, the more effort required to put the genie back in the bottle. Worse still, the longer we wait to deal with the problem, the worse it becomes. Eventually, killing the genie and throwing away the bottle will seem like a much more attractive option.

Look across enterprise networks today and the same problem is plays itself out over and over. In some cases, the problem resides with people like myself who aren’t organized enough to retain only the information they need. But even if all employees were exceptionally fastidious about managing their own data, I can almost guarantee you that the systems they use on a daily basis were built by people who weren’t quite as tapped down.

Based my own experience working with hundreds of different networks, I can tell you that the back-end data management problem is in many ways worse than the end-user management problem. This isn’t because IT people are lazier or less capable data managers. It’s because they are seldom in a position to make life and death decisions about the data they are charged with managing.

Strictly speaking, the data IT manages isn’t owned by IT. Sure, that copy of the Windows installation CD that’s sitting on the system disk of every server you own — that’s your responsibility and you should take care of that. But copies of production databases that were made by a vendor during data conversions two or three years ago? As a system administrator, can you be absolutely sure you don’t need that anymore? And if you can’t, who can? Is there really anyone who is in a position to pass judgment on that years-old backup, who actually knows what it means to keep it or let it go?

This diffusion of responsibility puts IT in an impossible position. When data belongs to a user, he or she can make snap decisions about what stays or goes and accept the consequences. Shared ownership across a group of people — or, worse, unclear ownership — compounds the issue immensely. A whole bunch of people need to get together and decide which data should end up in the dustbin of history. In most cases, that’s just not going to happen.

So why is our data growing so quickly? Because we’re our own worst enemy. We do not manage our own data well. In general, we don’t have systems in place to define retention policies for data at the time it’s created, nor do we have clear ownership guidelines for that data. Eventually, that will have to change. But as with all things that take time, money, and a will to fix — whether it’s the data explosion or global warming — nothing will happen until our backs are against the wall.

This story, “The high cost of lazy storage practices,” was originally published at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com.