j peter_bruzzese
Columnist

Make Exchange 2010 bulletproof with replication

analysis
Mar 21, 20124 mins

Exchange 2010 DAGs have incredible resiliency through redundancy

Just before 9/11, I was traveling quite a bit and suffering from a fear of flying that sprung out of nowhere. I hadn’t had a harowing experience in an aircraft, but to my mind a combination of lack of control and basic increase of odds put me at a greater risk for catastrophe. To overcome this fear, I spent hours learning more about aircraft and even spoke with pilots in the cockpit before flights to help ease my nerves. They impressed me greatly with the redundant systems that allowed the plane to fly even if things were breaking down. They explained that the primary cause for failure is, oddly enough, human error. In his book “Outliers,” Malcolm Gladwell substantiated that claim by explaining further how human error plays a major part in modern-day aircraft disasters.

Your Exchange messaging environment, obviously, doesn’t have human error as the primary cause of crashes (there are many reasons for failure, including hardware and software issues). However, human error may play a part. The reason: We’ve reached a point with Exchange where the ability to provide tremendous levels of availability through redundancy and resilient systems is so high that human error and/or lack of understanding may contribute to the cause of a data loss that doesn’t have to happen.

The core enabler of that high-availability redundancy is Exchange’s database availability group (DAG) capability. The storage architecture in Exchange 2010 has 1MB transaction logs, created as data goes into the production database. With DAGs, you can create a replica of the database, which is kept up to date automatically. DAG replicas are not limited to one location, unlike a normal active/passive-type cluster. Instead, you can have up to 16 replicas, in the same location or distributed throughout your data center and/or in other locations (such as company offices and failover sites) around the globe.

Exchange 2010 SP1 and SP2 have made the DAG capability even better, thanks to new features such as block mode which was introduced to reduce the latency between the time a change is made on the active copy and when that change is replicated to passive copies, thus eliminating a single point of failure in the current log file.

Another important feature is Datacenter Activation Coordination (DAC) mode helps prevent what’s called split-brain syndrome from occurring should a power outage (or some other issue) take out a primary datacenter that has the majority of the members of the DAG that enable the DAG to have quorum.  It does this by preventing databases from mounting in the recovered primary datacenter site.  Learn more about DAC mode through TechNet.

It’s a great solution, but make no mistake: There’s a cost involved that may prove to be too much, based on your needs. Designing high availability properly, depending on your environment, will require multiple servers all running the Enterprise Edition of Windows Server 2008/R2.

Even if you use virtualization for your Exchange environment, you still need to make sure you don’t place these virtualized servers on the same physical box. Otherwise, you’re completely defeating the purpose of high availability.  Some Exchange admins forget that fact and put multiple VMs on the same physical server. If that box goes down, they lose all VMs and all their Exchange availability.

When using DAG, you also need to worry about multiple LAN connections, multiple routing options, and even multiple WAN connections if you’re working with cross-site replicas (necessary to provide site resiliency).

On the positive side, DAG works. And compared to many third-party options, the extra cost and effort of using Exchange’s built-in DAG high availability on Windows Server Enterprise is often still the cheaper option.

But unless you set up your Exchange configuration properly to avoid issues such as split-brain syndrome, placement of all critical VMs on one server, and network routing dead ends, Exchange 2010’s high-availability capabilities won’t be able to do their job. It’s thus essential to understand DAG design and make sure you have enough servers to take advantage of other built-in Exchange features like lagged mailbox database replica copies, retention policies, legal hold, and archiving.

As with aircraft, the systems are in place to prevent disaster and help recover from major shocks to the system. But as is the case with pilot error, admin error or lack of understanding can nullify all high-availabillity technology. Don’t let it happen to you.

This article, “Make Exchange 2010 bulletproof with replication,” was originally published at InfoWorld.com. Read more of J. Peter Bruzzese’s Enterprise Windows blog and follow the latest developments in Windows at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

j peter_bruzzese

J. Peter Bruzzese is a six-time-awarded Microsoft MVP (currently for Office Servers and Services, previously for Exchange/Office 365). He is a technical speaker and author with more than a dozen books sold internationally. He's the co-founder of ClipTraining, the creator of ConversationalGeek.com, instructor on Exchange/Office 365 video content for Pluralsight, and a consultant for Mimecast and others.

More from this author