Part 1 of this blog series described the things that change and don’t change when recovering hybrid environments (part physical, part virtual). Part 2 of this series talked about application tiering and data movement, two things that don’t change even when the environment is hybrid. In today’s Part 3, I’m going to get on my soapbox about the three main challenges of recovering hybrid environments.
The 3 Challenges are:
- The need to recreate a multi-layer, multi-platform hybrid stack for each and every mission-critical application.
- The need to do point #1 above within a certain recovery time objective (RTO).
- The need to spend the capex for a second site (both hardware and software), and the opex to maintain it.
Let’s take a typical example of a 3-tier web application, say, an e-commerce application. The application may have a database layer that is on two different systems: a Linux system running Oracle and a Windows server running SQL. Next, examine the middleware, or business logic, layer of that application with it on a Win2K server running WebLogic, and its job is to aggregate data from the Oracle and SQL servers. And finally, the web layer is on an ESX server running Apache. To make things more complicated in this scenario (and therefore realistic), the web and middleware tiers are stored on an EMC SAN device, the Oracle database is on a NetApp SAN device, and the SQL server is on yet another storage vendor’s device (say, a Dell device).
In this scenario, you have multiple storage platforms, multiple compute platforms, multiple operating systems, and a mix of physical and virtual environments. Sound familiar? Now, let’s say something goes wrong, and you need to recover this application at your recovery site. News flash: your recovery is going to fail if you haven’t created the identical physical and virtual stacks in your recovery environment to accommodate all three layers. If you have the wrong version of VMware’s hypervisor running in the recovery environment, you’re dead in the water. If you have the wrong hypervisor running in the recovery environment (say, Xen), you’re dead in the water. If you have only the ability to recover the database layer by itself, or both the database and middleware layers without the web layer, you’re dead in the water. Or vice versa – getting the web layer back without the other two layers also leaves you up a creek.
And that’s just ONE app. What if you have 50, 80, or 100+ apps to recover? Now, compound this problem with the problem of having to recover all of these apps within a certain RTO, and you’re starting to get the picture of the magnitude of the challenges presented by hybrid environments. In a word: elephantine.
In order to support the recovery of a hybrid environment, you need to have the correct infrastructure in place: the right recovery technologies for each platform and O/S in your secondary site, the right expertise or staff (an Oracle person, a Windows person, a storage person, a VMware person), and a well-documented disaster recovery blueprint (or runbook) that contains all of your recovery processes.
Moreover, in order to have that runbook be current, you need to make sure that any changes in production configurations make their way into the recovery environment (change management). And putting all of this in place could cost a big bundle.
Do these challenges sound familiar to you? If so, how are you addressing them today? I’d love to hear your feedback, as well as any insights into what you’re doing to keep your hybrid environments available.
We’ve just published a white paper containing a more fleshed-out version of SunGard’s suggested approach to recovering complex hybrid IT environments, if you’d like additional information.