by Stephanie Sanborn

Spreading out the safety net

feature
Mar 29, 20028 mins

The concept of distribution is coming into play when planning for continuity

DURING THE PAST few years, companies have been moving toward the centralization of technical and operational resources to cut costs and improve efficiency. But now, spurred by the events of Sept. 11, 2001, and the ensuing interest in ensuring business continuity by weaving a wider safety net, the pendulum is swinging back toward the distributed model.

“The change in mentality has been a switch from efficiency at all costs to efficiency, that’s great, but resilience, too,” says Simon Mingay, vice president and research director at Stamford, Conn.-based Gartner.

Business continuity can be strengthened through distribution — moving information, business applications, and computing power away from the central office so business can get back up and running relatively quickly, without the need to rebuild critical systems from scratch.

“The distributed architecture can support both recovery and continuity. It can be engineered either way,” says Gary Hilbert, vice president of high availability and security at AT&T. “We’re trying to avoid a situation where all the critical assets are housed in one unique location.”

Despite all the talk, action on business continuity has been relatively slow, due to the down economy and attempts to downplay or deny risk. Many executives are trying to convince themselves they can avoid incurring expenses by relying on existing continuity plans (often devised for Y2K and out of step with current needs).

Analysts stress that events on a much smaller scale than Sept. 11 are more likely to trigger the need for a business continuity plan: hardware failure, a broken sprinkler system flooding a datacenter, network failure, power outages, a road repair crew cutting through phone and data lines, human error … “those little things … not the hellfire and damnation scenario,” Mingay explains.

“[A business continuity plan] isn’t worth it until the disaster occurs, like any other form of insurance,” says Terry Rice, project manager and principle security engineer of CACI International in Arlington, Va. CACI works with companies and federal agencies on information assurance plans. “It really does come down to a risk management issue: Are you willing to put forth the amount of money to protect against this risk?”

Choosing to distribute resources to offsite locations also helps ensure that employees can access the information and applications they need during a disaster or business interruption, even from locations away from the main office.

Having a backup of systems and data does no good if it’s not accessible, Hilbert notes. “When you talk about mitigating risk, you’re talking about dispersing assets,” he explains. “Now, the whole strategy-think of business continuity is network-centric: you need a network to connect these dispersed assets so you have availability to them if one part of your business should suffer an impairment.”

Large business continuity players, including IBM and SunGard, recognize the network’s growing importance and are aiming to put more emphasis on high-availability, fault-tolerance, and distribution of data, applications, and communication technology. AT&T, for example, offers managed risk services to identify vulnerabilities, recovery services backed up by portable trailers full of equipment, and “ultravailable” services using DWDM (Dense Wavelength Division Multiplexing) to monitor and manage connectivity.

Common forms of distribution for business continuity include offsite storage and backup of data and distributed datacenters with available processing power, or resources situated at other company locations. However, companies must be diligent about testing their backups to be sure data is actually being recorded, in addition to addressing accessibility issues, Mingay says.

Going for the grid

Hovering outside the mainstream are distributed computing technologies such as grid computing and peer-to-peer networks (see ” Next-gen distributed computing “). These technologies are somewhat detached from enterprise systems and can function independently, to a certain extent. As a result, they are inherently more fault-tolerant and able to spread work over other network nodes if one point goes down.

“If there is a disaster, then the grid computing is a dynamic environment. It simply works like each of these centers or connected locations are effectively nodes or points of computational work. By cutting off a particular node, you only divert the work around to other parts of the grid,” says Ian Baird, chief business architect and corporate grid strategist at Platform Computing, a developer of distributed computing software in Markham, Ontario.

If one or more “nodes” shuts down, the only effect would be a reduction in capacity, but functionality would still be available for users, Baird says. “The analogy is like an octopus. If you cut off one tentacle, it can function on the other seven,” he explains.

There has been much interest in grid technology lately, especially in its resiliency. St. Louis-based biotech company Monsonto, a Platform customer, has a network of small servers running in parallel instead of a single, less fault-tolerant supercomputer. Other companies are setting up grids to link offices, partners, project collaborators, or supply-chain members.

“The first grids that are going to take up [distributed computing ideas] and therefore address things like business continuity are enterprise grids,” Baird says. Because enterprise grids have the advantage of residing behind a firewall and are more easily controlled than those spanning the Internet, they make a good introductory step, he adds.

Many industry players, including IBM, Sun, and organizations such as the Globus Project, are pushing to mature grid technology through research and continued discussion of its potential in the business world. CACI’s Rice views the academic and industry discussions on grid and other distributed computing technologies as “absolutely critical.” But he thinks it will take time for these technologies to get a strong foothold in most companies when it comes to business continuity.

“Until there are more commercial applications that have been developed and tested, there’s going to be some reluctance to immediately jump into [grid and other distributed computing technologies]. There’s not a large test bed of people who have switched to that technology,” Rice says, noting that many organizations either already have three-to five-year continuity plans in place or have them in the works, and making a “midcourse” correction would be budget-busting.

“But I don’t want to downplay the need for this [discussion about newer distributed computing options],” he adds. “It’s absolutely critical, and as time progresses, people will move to that type of architecture. There have been companies pushing for that kind of approach for many years.”

Still seeking a silver bullet

Although distribution has its upside, spreading resources too thin can harm a business continuity plan. Deploying to too many locations can be expensive and make day-to-day management more difficult.

“As you distribute systems and spread them geographically, it becomes more likely you would suffer some kind of loss because you’re increasing the risk exposure,” explains Gartner’s Mingay. “Operationally, you’re more likely to suffer downtime because it’s more difficult to manage that kind of environment and therefore it’s more likely that operator error or application failure would render part of the system unavailable.”

The best plan is to balance distribution (to provide business resiliency) with the need for efficiency — for example, having two off-site datacenter options instead of four, Rice says. This allows some cost benefits of centralization while decreasing the risk of a single incident disrupting all business.

“Based on the concerns for business continuity, both in the Y2K scenario and recently [with Sept. 11], there has been a renewed interest in ensuring there is a distributed or a redundant capability for that centralized facility to come back up after a disaster,” Rice says. “It’s about making sure things can keep going even if it’s not really ‘business as usual.’ ”

This was a lesson Greta Ostrovitz, director of IT at Wall Street law firm Cadwalader, Wiskersham & Taft, took to heart after the events of Sept. 11. Cadwalader’s offices were just three blocks from the World Trade Center, but because the firm owned their building, Ostrovitz’s team was able to enter on Sept. 12 to work on getting the network up and running so the firm could access critical documents and e-mail.

Although the company already had an agreement with SunGard for business continuity, Ostrovitz says she learned a lot about the systems that are absolutely necessary to keep the firm going. Access to applications and the network became the critical component, and Ostrovitz is planning to implement more Web-based versions of applications used every day, such as Lotus Notes and Interaction, an application large law firms use to manage client data.

Ostrovitz is also taking a hard look at where the firm’s systems and applications reside.

“While we were down, our Washington, Charlotte [N.C.], and London offices were up and running, and they needed software that was only in New York,” she explains. “So we now have to look at distributing that a little differently, even if it means it’s going to be a little more costly and we’re going to have to replicate data in some places. Each office needs to be a little more self-sufficient.”