matt_prigge
Contributing Editor

Downtime is … good?

analysis
Dec 12, 20114 mins

A healthy dose of planned downtime can save your bacon. Don't buy into the 24/7, always-on culture unless you absolutely must

Ask yourself: How do your users react when you announce (or plead for) a downtime window to accomplish an upgrade or to perform maintenance? Not well, I’d imagine.

Years ago, scheduled downtime was a common occurrence in all but the very largest IT shops, but today, few businesses let you get away with a solid downtime window without an act of Congress. Even some shops without obvious 24/7 requirements — like three-shift manufacturing plants or hospitals with emergency rooms — have a hard time denying their user base access to data even in the wee hours of the night.

The reasons for this are many, but they boil down to a voracious dependence on IT systems for day-to-day business — and massively improved disaster avoidance brought about in large part by the advent of server virtualization. Businesses are addicted to data; technology has improved to the point that we in IT can readily feed that addiction.

This closed loop has an unfortunate, twofold effect: It creates an atmosphere where even the smallest request for planned downtime is often denied or delayed — and users become entirely unprepared for what to do when disaster strikes.

The three joys of downtime

First, downtime can do a lot to help keep your environment solid. If you have to wait weeks or months to apply critical infrastructure patches, you’re simply asking for trouble. While most systems in a modern IT infrastructure can be patched with very little downtime, with others, to keep up to date, you need to power down and inconvenience at least a few users.

Take your garden-variety switches and routers. They often sit untouched for years and work perfectly without interruption. In fact, one desktop aggregation switch I touched this past week had an uptime of more than 2,000 days. That’s a huge testament to the manufacturer, but I’ll bet you could drive several small vehicles through the easily exploitable security holes in that device’s firmware.

Second, by taking advantage of planned downtime windows, you can exercise your high-availability capabilities and disaster recovery plans. If you rarely test your HA or DR capabilities, there’s a much greater chance they won’t work when you actually need them. As an astute reader commented on a blog post I wrote last year: “Nothing that is used less often than once a day works every time you use it. The less often you use it, the more likely it will fail when you do use it.” In my experience, that couldn’t be truer.

You know how your HA systems are supposed to work, but are you sure they’ll work? Do you have a Fibre Channel SAN with redundant switches? How about redundant core network switches or a database cluster? Would you let me shut off one of them in the middle of the day with no warning to the user base?

If you protest, you’re simply not sure enough. By intentionally knocking over redundant portions of your infrastructure during a planned downtime window, you’ll gain the confidence that your HA systems are going to work the way that they’re supposed to. If they don’t, you’ll find out where to focus your efforts on improving them when you get the time or budget dollars to do it.

Lastly and maybe most important, planned downtime gives users a controlled taste of what to expect in the event a real disaster takes place. In the handful of truly nasty infrastructure-down outages I’ve seen firsthand, user confusion and paralysis were the worst outcomes. Yes, you’d fully expect downtime of critical business systems to impact productivity, but you’d be shocked by how many of the worst impacts could’ve been avoided by using fantastically simple measures. You may simply never discover what those tasks are without occasionally shutting off everything and finding out.

Putting it all together While inconveniencing your user base needlessly is a great way to find yourself looking for a new job, real, tangible business benefits result from taking down portions of your infrastructure when there’s good reason. The cold, hard reality is businesses that staunchly resist requests for planned downtime will eventually find themselves experiencing far costlier unplanned outages with untested recovery mechanisms and a user base that’s completely unprepared for life without data. As unpleasant as it may be, do your best to make that argument the next time you get pushback on your downtime request. It may not be fun, but it’s far better than the alternative.

This article, “Downtime is … good?,” originally appeared at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog and follow the latest developments in storage at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.