by Brian Fonseca

A matter of minutes

feature
Mar 29, 20026 mins

In rapid-response disaster recovery plans, experts say intangibles and wireless are keys to mitigating damage

EMERGENCIES IN IT often arise outside the realm of technology and they are unpredictable, so developing a rapid-response disaster recovery plan for every contingency is impossible.

Nevertheless, rapid-response disaster recovery plans are joining security recovery and business continuity planning as staples in a CTO’s repertoire against potential threats to data and operations. The goal of an IT rapid response plan is to provide a framework in which the CTO can quickly react, respond, and steer a predetermined course of action to minimize losses when an event occurs.

The key to that success, says Alan Lloyd Paris, a partner at Capco, a financial IT consultancy based in New York, is to consider intangible elements rather than a strict set of rules to act upon at a moment’s notice.

“The idea is to plan around a particular set of outcomes as opposed to planning for any particular emergency,” Paris says. “You can’t plan for everything, so you have to develop a plan that’s flexible and that takes a look at a tiered set of problems.”

For instance, Paris says, rather than plan recovery based on certain external threats — such as a bomb, or a chemical or biological attack — use a simple triple-tiered approach: Plan what to do when building access is denied; what to do when a certain floor needed to transact business is closed; and how to recover from a particular system outage.

The importance of rapid response tactics was pushed to the forefront Sept. 11. The sheer mass destruction caused by the terrorist attacks on the World Trade Center and the Pentagon forever altered perceptions of the complex web of variables that might be affected by a major disruption.

Tom Moogan, director of Global General Services, CitiGroup corporate and investment bank in New York City, says that prior to Sept. 11 his company’s assumptions were that a single power grid was the only realistic disaster for which to plan. Times have changed.

CitiGroup lost 472 file servers and 4,300 workstations when 7 WTC was destroyed and had to evacuate 16,500 of its employees in the aftermath of the attacks, Moogan says. The financial firm lost 1.3 million square feet of property.

During the disaster, Moogan says that 800 out of 2,550 disaster recovery plans that he manages were implemented and its foreign exchange desk was transferred to London.

Although all CitiGroup employees were up and running the next day, Moogan says implementation of the rapid response measures provided valuable insight for future disaster planning. “The issues we faced were significant problems with counter parties. Some of the disaster recovery plans of the firms that we trade with and close with were not as robust as we had assumed,” Moogan says. “Therefore we had significant issues on our balance sheet we had to resolve.”

Due to Wall Street’s Verizon building damage, New York City’s voice infrastructure was hampered. CitiGroup’s decision to invest in BlackBerry wireless devices proved critical as it became the primary mode of communication during the day of the attack.

Wireless, local, and prepared

Analysts say more recovery efforts were managed on BlackBerries in the days following the terrorist attacks than on any other computing device. This shift signals a crucial relationship between effective rapid response and availability of wireless technology devices and wireless LANs.

“What firms really need to do is have a central site for employees, whether a Web site or a call-in line. … A lot of these firms found that they couldn’t reach people,” Capco’s Paris says.

Establishing personal familiarity and solid relationships with local authorities and rescue organizations could also pay dividends should immediate assistance ever be required.

“I think critical [rapid] response worked where firms already had already made contact with the mayor’s office and the police and fire departments. Sept. 12 was not the time to make your first contacts [with them],” Paris adds.

In order to mitigate the response to be carried out after an unforeseen event, Moogan says CitiGroup performs crisis management planning for less common disaster recovery areas such as application development, transportation, and security protocol settings.

A CTO also plays a pivotal role in setting the rapid response bar and provides an example of how to best deal with high-pressure situations while exploring the safest and quickest route for technological asset retention.

“People should take the track of learning from [those] who responded well during this crisis regardless of the level of position in the organization. [If it happens again,] you’ll need to rely on these people,” Moogan says. “People who stay have certain things they have to get done — see who can respond well in a crisis and be innovative. A lot of people fall back on routine during a crisis when routine is not what is required.”

Rapid product ramp-up

Rapid response preparation for some businesses may result in the ramp-up of products or applications weeks or months ahead of schedule.

Graham Albutt, president of the business technology group at Reuters Globally in New York, says employees in Geneva worked through the night on Sept. 11 to produce a new product, Reuters Market Monitor. The Web-based trading floor tool, offered freely to financial and trading institutions at the time, offered customers real-time quotes, news, and more importantly a baseline communications application during the Sept. 11 power blackout.

Reuters also moved an unidentified amount of products from prototype phase and delivered them to customers during days following the disaster to improve access to data capabilities, Albutt adds.

Core recovery

For some IT managers, disaster recovery is part and parcel of the organization’s core mission. And enterprise CTOs can learn from their recovery plans.

Doug Bolton, an information systems analyst at the San Diego Fire Department, says his team must be ready for rapid response execution. All emergency 911 calls are rerouted to his organization in the event of a disaster.

For example, the department has its primary Stratus FT server — by design only using 30 percent of capacity — connected to two separate power grids, one at its central headquarters and another at its backup datacenter about 20 minutes away.

If the building is still standing, the location will be used as central headquarters for disaster recovery and crisis planning by the city’s fire, police, and civil service departments. An Urban Search and Rescue Team (USART) is also equipped with a radio and paging system designed to keep the department’s system afloat and to transmit information online.

Bolton says laptops used by the fire department are equipped with applications written to handle typical emergency calls as well as disaster recovery measures if necessary. Also, wireless connectivity and communication devices that generate their own power are crucial at a disaster site, he says.

“We’re responsible to the citizens of San Diego. We cannot predict when they are not going to be feeling well or having a traffic accident,” Bolton says. “It’s not acceptable to give excuses. If it means we have to grab some equipment and we’ll run to get to this call — it is something we need to do.”