If your organization can't risk downtime, you need a warm site -- and advice on avoiding the pitfalls of building one Enterprises bend over backward to ensure a reliable, redundant infrastructure. Whether that means deploying hardware with redundant components or building multicarrier WANs with automatic failover, the constant battle against productivity-killing downtime drives many decisions and eats a big chunk of the budget. The pinnacle of redundancy can be found in the warm site: a completely self-sufficient server and storage architecture at a remote location, constantly kept in sync with the production site. Building a warm site from scratch has always been considered an expensive and difficult prospect — until now, thanks to virtualization and inexpensive, replication-ready SAN hardware. Businesses that never would have considered implementing a warm site are currently in a position to do so. But even with these technological and cost-basis gains, building a warm site has its challenges. Here are three pieces of advice I give based on the situations I often see in the wild. Manage expectations Setting business-side expectations is important in any IT project. With backup or disaster recovery initiatives, it’s absolutely essential. Defining the RPO (recovery point objective) and RTO (recovery time objective) and communicating them to stakeholders well in advance of the design process will save you time and anguish. If you’ve just spent a bundle on a system that can’t meet stakeholders’ expectations, you’re putting your career is at risk. Trust me. RPO and RTO can have a huge impact on the overall cost of the solution, but if you have a buy-in from management ahead of time, your design will be perceived to fill management’s needs, not those of IT. If it’s too expensive for the organization to accept, then you can collaboratively fiddle with the requirements to decrease the cost. Putting others in control of setting downtime requirements is vastly preferable to silently doing it yourself and being “wrong.” Beware of hardware gotchas SAN-to-SAN replication is almost always the preferred way of keeping data on the primary site in sync with the warm site, but you can expect a number of challenges, depending on the hardware you have in production. Most SAN platforms require additional software licensing to perform replication to another platform, and it’s usually not cheap. Several platforms don’t require this, so if you’re in a position to reevaluate your entire storage platform, be sure to take licensing into account. Either way, make sure that both software and support costs figure into your budget. SANs that only offer Fibre Channel connectivity or will only replicate over Fibre Channel will generally require fairly expensive Fibre Channel over IP (FCIP) bridges at each site (unless you happen to have dark fiber available to you). Worse, FCIP isn’t particularly efficient with compression or deduplication. FC platforms typically assume limitless amounts of bandwidth — rarely the case when replicating to a remote site. Finally, different SAN platforms may have dramatically different replication bandwidth requirements, even under identical loads. The reasons vary, but they generally depend on the replication block size used by your SAN. If you modify a 1KB file on a volume located on a SAN, you can almost guarantee that a much, much larger chunk of disk on the SAN will be replicated as a result. This is a result of the SAN using fewer large blocks to track the parts of a volume that have changed in order to decrease the performance overhead of implementing replication. The bottom line is that if your workloads modify lots of random bits of data all over the place on disk (databases and email servers often fit this pattern), you can expect raw replication bandwidth loads several orders of magnitude higher than your write rates. Watch out for bandwidth costs Too many times, people fail to figure in the high cost of bandwidth between the production site and the warm site. Unfortunately, the actual cost is hard to predict. If you’re lucky enough to have gigabit-class connectivity from your production site to your warm site (if your warm site is on the same campus, for example), this isn’t likely to be a big issue. But if one site is out in the sticks or not in the footprint of an affordable fiber-based bandwidth provider, watch out. If only leased lines are available, you could easily spend more on recurring telecommunications costs over the lifetime of the solution than on the storage and server hardware combined. The RPO you’ve set will live or die based on how quickly you can ship changes made at the production site to the warm site. If you’re chasing RPOs measured in minutes rather than hours, this becomes even more important. Disk write rates are generally anything but constant — especially in virtualized environments where you are replicating not just data, but the entire operating system and system state of the machines running within the environment. As such, a short period of high write activity (such as applying a service pack on a virtual machine) can require a great deal more replication bandwidth than usual to keep up with the changes being made. If your replication bandwidth and RPO have been based on average change rates, your replication may fall behind and then, hopefully, catch up when change rates have decreased. The most obvious repercussion is that you may temporarily fail to meet the RPO. A less obvious result: You may require more storage capacity on both ends to maintain larger replication change logs. On very slow circuits, where replications may take many hours, these change logs can soak up a significant amount of capacity on both the sending and receiving SANs. With tight bandwidth constraints, you will want to consider deploying WAN accelerators at both sites. WAN accelerators such as Riverbed’s Steelhead, Citrix’s Branch Repeater (previously WANScaler), and Certeon’s VM-based aCelera can all accelerate the flow of data by applying WAN compression and protocol optimization, while deduplicating data at the same time. Better warm than out in the cold Building a warm site is the best insurance against any kind of site-wide failure or natural disaster. If your organization has grown so dependent on its data that losing access is unthinkable, site redundancy may be your only reasonable option. Just keep in mind that diligence and careful planning are required if you want to avoid unpleasant surprises. This story, “Hot tips for building a warm site,” was originally published at InfoWorld.com. Read more of Matt Prigge’s Information Overload blog at InfoWorld.com. Technology Industry