Combating cloud outages: There’s a simple solution

analysis
Jun 11, 20103 mins

Cloud services fail when the demand overwhelms them -- but why is that allowed to happen in the first place?

I was amused by Steve Jobs’ Wi-Fi overload issues during his iPhone 4 presentation. While he could ask the audience to “put your laptops on the floor” and turn off their 3G-to-Wi-Fi devices, most cloud providers won’t have the luxury of asking customers not to use their services when their cloud platforms get oversaturated. 

There have been many recent availability issues with cloud providers, such as Twitter’s and Google Calendar’s struggles, as well as Google App Engine’s datastore taking a dirt nap under demand. Or, as Google puts it in a recent post: “There are a lot of different reasons for the problems [with data store] over the last few weeks, but at the root of all of them is ultimately growing pains. Our service has grown 25 percent every two months for the past six months.”

There are also many cloud outages and availability issues that aren’t reported, but have the same negative affects on the cloud users. What we hear in the press is the tip of the iceberg.

I think this increase in outages caused by saturation is just the start. I suspect with the increased use of cloud computing this year and next, clouds falling over due to stress will be more commonplace.

The core issue is the saturation of resources by too many users doing too much on the cloud provider’s servers. Putting any architecture and design issues aside for now, it’s as simple as that — it’s also a very old problem.

The solution is also as simple. It’s called “capacity planning” — making sure the capacity of your current system (in this case, virtualized and multitenant server clusters) will meet the demands of the number of users working in the cloud, as well as their patterns of consumption. Back in the day, there were many capacity planners running around, but with the advent of commodity hardware and software, capacity planning (including performance modeling) has become a lost art.

For the most part, there is no excuse for availability issues due to lacking capacity. You know where your saturation point is, so you need to make sure you have enough resources on hand never to reach that precipice. That said, I suspect most of the capacity planning that occurs within cloud providers these days is to watch the usage graphics move upward and to try to add more equipment before processes run out of room. That is clearly not a successful strategy.

This article, “Combating cloud outages: There’s a simple solution,” originally appeared at InfoWorld.com. Read more of David Linthicum’s Cloud Computing blog and follow the latest developments in cloud computing at InfoWorld.com.

David Linthicum

David S. Linthicum is an internationally recognized industry expert and thought leader. Dave has authored 13 books on computing, the latest of which is An Insider’s Guide to Cloud Computing. Dave’s industry experience includes tenures as CTO and CEO of several successful software companies, and upper-level management positions in Fortune 100 companies. He keynotes leading technology conferences on cloud computing, SOA, enterprise application integration, and enterprise architecture. Dave writes the Cloud Insider blog for InfoWorld. His views are his own.

More from this author