Posts Tagged ‘disaster recovery’

Global Warming No Excuse for Lack of Winter Storm #DR Planning

By Bob DiLossi

The 2012 hurricane season has thankfully come to an end and now is the time for businesses to prepare for winter storms. As some parts of North America have been experiencing a milder winter, winter storms can still and will occur – take winter storm Nemo that plagued the Northeast in early February for example.

On average, the United States has roughly four catastrophic winter storms annually with storms occurring most commonly in the northeastern United States. Being prepared is key, in some ways, winter storms can be the most challenging weather systems because they spawn so many types of emergencies.

Blizzards, electrical storms, hail, high winds, ice, sleet, and snow can contribute to communications failures, power outages, and risks to your buildings. Storms also lead to many driving accidents and you can lose critical personnel to injuries from slips and falls.

You need to prepare for all events that may occur, from damage to buildings to your business to your people. All three need to be part of the business continuity plan and part of the testing of your plan. As companies strive to meet the demand for continuous service, they expect 24/7/365 availability. However, the average organization’s requirement for recovery time objective (RTO) from an outage now ranges between two and 24 hours.

To help better protect your organization from the impact of winter storms, below you will find a checklist to gauge where you stand on preparing for winter storms. As you read the list, consider the impact each of the items would have, if they occurred, on your operations.

Building:

  • Building managers unable to get to the building to assess and mitigate damage
  • Communications infrastructure failures
  • Explosions
  • Freezing and flooding of interior building areas that may result in ceilings collapse
  • Gutter clogging with ice dams, leading to leaks
  • Hazardous material accidents
  • Power outages, causing building environmental controls to shut down
  • Roof damage or collapse due to ice, snow, or fallen trees
  • Structural damage or collapse
  • Transportation accidents or closed roads that trap people in or out of your building

People:

  • Communications issues
  • Employee safety
  • Lack of corporate presence during recovery
  • Lack of lodging/logistics
  • Not focused on recoveries
  • Team players not available to travel

When it comes to the business itself, you need to consider a winter storm’s influence on several areas of operation. Run through this checklist and determine how you would satisfy these conditions if problems arose:

Business: 

  • Customers expect supplies and services to continue—or resume rapidly
  • Employees expect both their lives and livelihoods to be protected
  • Insurance companies expect due care to be exercised
  • Regulatory agencies expect their requirements to be met, regardless of circumstances
  • Shareholders expect management control to remain operational
  • Suppliers expect their revenue streams to continue

After going through the checklists and developing ways to address all of these items, you then need a plan of action to use once a disaster strikes. To that end, there are three major steps to begin the process of managing the incident:

  1. Mobilize a central command center, activate a business recovery plan and identify exactly how long the organization will operate in a recovery state, and plan accordingly.
  2. Following-up closely is the need for your organization to carefully document your processes, both in terms of how to recover and how to operate.
  3. You also need to practice and refine processes using a variety of scenarios.

To help with these preparations, a free business continuity toolkit is available from SunGard.

Recognizing the potential disruptive dangers from winter storms, in our next blog we discuss the importance of developing and practicing a suitable DR plan.

Extend Your Avamar Backup Investment into a Disaster Recovery Solution

By

In today’s 24×7 business world, application availability is critical. And catastrophic events (including natural disasters in major urban areas) in recent years around the globe have simply reconfirmed how essential an offsite option for production application availability is in any disaster recovery (DR) plan.

Fortunately, organizations using the popular EMC Avamar disk-based backup solution might be able to leverage their investment in that technology to create a true offsite application availability solution.

In fact, relatively new capabilities such as the inclusion of Avamar software in VMware vSphere 5.1, as well as enhancements to the core Avamar product suite offering improved integration with Data Domain and a media access node to support long-term tape archiving, offer synergistic benefits if applied to an offsite application availability DR effort.

With growing market adoption, now is an ideal time for Avamar customers to ask how best to establish a DR strategy, off that primary production backup investment, which integrates seamlessly and creates a true offsite application availability solution.

Here are some of the criteria that may help drive this decision:

Application tiering: Avamar is a disk-based backup solution that utilizes efficient, source-side deduplication capabilities to significantly decrease the size of the backup and the backup window. This enables bandwidth optimization in replicating a copy of that Avamar backup offsite. But the trade-off for savings in data footprint is balanced by data restoration needs. How do you recover the environment from an Avamar backup format to live system state? Application tiering takes into account the business criticality of the protected applications. The majority of IT workloads can be accommodated with an RTO (recovery time objective) of 24 hours or less, but some applications may require synchronous or asynchronous replication to a replica instance of the production environment running in a public or private cloud configuration.

2nd copy utility storage consumption: Does your DR solution for your Avamar environment need dedicated infrastructure, or can your organization realize the economic benefits of utilizing storage as a utility?  The type of data, length of retention, size of data footprint, and organizational budget approach towards operating or capital expenditures typically drives this decision. Many customers can benefit from outsourced service providers running cloud storage environments capable of ingesting an Avamar backup.

Restoring your Avamar environment: Does your second IT facility or colocation provider provide computing resources—for both physical and virtual environments—that can be provisioned as a utility as well? This can prevent the unnecessary and redundant expenditure for dedicated compute resources to restore your environment in a DR event.

Managed services, when and where to outsource: Finally, what type of additional services do you need to best support your DR plan for your Avamar environment? Can you manage the replication of your network to your DR site internally? Is that best outsourced? What types of SLAs (service level agreements) do you need around monitoring and management of your backup and recovery time in the event of a disaster? And, most importantly, how assured are you that you can restore your Avamar environment at your test/dev or second IT facility? Will you have the staffing and expertise necessary to restore and bring up your protected IT applications?

Asking these critical questions can help you determine when and where to consider an outsourced service provider to offer you not just offsite disk-based backup, but true recovery for your Avamar environment.

INSIDER focuses on Sandy, Security

Check out the latest edition of our monthly newsletter, the INSIDER, a place for IT professionals to explore the latest news, trends and tips. This month’s issue features articles and videos focused on security.  We’ve addressed such topics as recovering from disasters like Superstorm Sandy, to safe identity management for your workforce, to secure change management and cloud application testing.

This month’s Recovery Services video focuses on Workforce Continuity/Workplace recovery in the face of disaster – a timely topic in the aftermath of Superstorm Sandy. Another well-timed article details exactly what steps SunGard takes in times of natural disasters like Hurricane Sandy. What exactly does SunGard do for its customers when faced with impending crises?

For the CIOs in our audience, we spoke with SunGard’s Atif Malik about the importance of executive dashboards for CIOs as a way to achieve greater efficiencies in their data centers.  We’ve also got a few new services to tell you about — the Customer Configuration Repository – designed to revolutionize change management, as well as a new security solution called Single Sign On, available for all SunGard customers.

Our Cloud trend series concludes with a brief look at what happens once you’ve developed and tested an application in the cloud.

Finally, we recap SunGard’s involvement in the latest events and conferences, including the Gartner Data Center conference and our Business Continuity Software International User Group meeting.

Stay on top of the latest SunGard news with the INSIDER, found here on the SunGard website each month. You can subscribe to our monthly newsletter here.

IT Disaster Recovery Lessons from…Gangnam Style?

By , Director of Product Marketing, Recovery Services

Three weeks ago, I flew to China to attend my brother’s wedding. My new sister-in-law is a local Shanghainese girl of great beauty and brains (I get to call her a “girl” because I am almost old enough to be her mother), and her wedding to my brother was a fusion of traditional Chinese customs and modern day YouTube phenomena…with hilariously enjoyable results.

Apparently, it is a Chinese tradition for the groom and his groomsmen to go and pick up the bride on their wedding day. The bride’s family, however, deliberately makes this process difficult, obstructing their entry and setting up several “tests” for them to overcome. In retrospect, I now understand why my brother called it, “busting my bride out of her bunker.”

When he and his entourage arrived at the bride’s home, they had to each down an 8-ounce glass full of a stomach-wrenching mixture of soy sauce, vinegar, and Coca-Cola. Then, they had to pass slices of cantaloupe to each other using only their mouths, which culminated in a few unwilling smooches between groomsmen. Finally, to gain entry to the bride’s home, they had to successfully perform the “invisible horse” dance from Gangnam Style.

Let me digress for a second. If you have not heard of the song, “Gangnam Style,” then you are seriously, violently, and probably irretrievably behind the times. You are pretty much something right out of the Cretaceous period. Propelled by YouTube to a worldwide phenomenon, this K-Pop (translation: Korean Pop) music video with a catchy beat and over-the-top, random vignettes has garnered over 722 million views (to date), and is the most “Liked” YouTube video of all time. (So that you don’t feel too bad, I was a fellow Triceratops myself until I went to this wedding).

Anyway, my brother and his friends became so engrossed in dancing to Gangnam Style that they did not even notice that the door to his bride’s home had been opened. In fact, as they were joyfully galloping in front of the house, one of the bridesmaids remarked to the bride, “It looks like your future husband is more interested in performing than in picking you up.” So when my bro finally finished passing the test and strode up to the front door, it is small wonder that the bride slammed it back shut in his face (!). That, of course, meant that they had to perform a whole new set of onerous tasks.

All of this hilarity was captured on film, which is why “Gangnam Style” became THE default theme song of their wedding…and also why we all found ourselves horse-trotting to it on the dance floor that evening.

I got to thinking about all this, and it occurred to me that there are several important IT disaster recovery lessons – heck, maybe even life lessons – out of this:

First of all, it’s important to keep your eye on the main goal, and not to get distracted. In my brother’s case, his main goal should have been to get into the bride’s front door, not to master the Gangnam Style dance per se. Similarly for IT professionals, it could be argued that their main goal is to support business value creation, not to be distracted by trying to master supporting functions like DR (disaster recovery) per se. DR is important, but only as a means to an end, not as an end in itself.

Secondly, although the song’s writer and performer, PSY, has been a Korean pop star for over a decade, his cherubic face and portly body would have made him the unlikeliest of candidates to become their first crossover star.  In my opinion, what caused this global blow-up of Gangnam Style and fueled him to mega-stardom is an application, and one that was not even around 10 years ago. What am I talking about? YouTube, of course (what is YouTube if it isn’t a “ killer app?”)! Ten years ago, without the massively viral properties of YouTube, the song probably would have topped out on the Korean charts and went nowhere else. YouTube now blurs the boundaries between nations, races, and languages, such that we in America don’t even care that we can’t understand the song, we still LOVE it.

Similarly for us IT professionals, it can be difficult to predict which applications will have the most impact, especially in the context of downtime. We often think it’s the ones supporting revenue generation that are most mission-critical, but as this case study about a nationwide retailer shows, it is sometimes those “less critical” applications that have the greatest impact. For this particular customer, certain human resources and finance applications actually carried severe financial penalties that could exceed any revenue losses, should they become unavailable for significant periods of time.  It took going through a business impact analysis for them to identify these impacts, and only then could they begin to shift their availability strategies to account for the proper priorities.

If you can find any other IT or life lessons from Gangnam Style, please feel free to share them in a comment below!

125 Enterprise Leaders: Survery Results on Disaster Recovery Planning [infographic]

In SunGard’s annual survey, 125 Enterprise leaders weighed in on disaster recovery planning.  What are their highest priorities? How do you compare? The results may surprise you.

With a myriad of events just waiting to take down an IT infrastructure, recovering from downtime is not a matter of if, but when. That’s why Disaster Recovery Planning is a top issue for many Fortune 1000 executives.

Understanding their priorities can help everyone prepare for the unavoidable – before it’s too late. The survey shows where your peers sit on Disaster Recovery, including:

  • Highest priorities
  • Biggest challenges
  • Outsourcing plans

If disaster recovery planning is keeping you up at night, gain insight into what peers are doing about disaster recovery by viewing  SunGard’s survey Infographic.  If you would like more assistance, use SunGard’s Disaster Recovery Total Cost of Ownership Assesment to help determine if an in-house or outsourcing disaster recovery is right for your business.

DR Planning Infographic

 

SunGard Carlstadt Business Continuity Center serves as Command Post, Shelter During Hurricane Sandy

By George Gobla, Technical Service Delivery Manager

Police, firefighters and EMTs from Moonachie, N.J. used the SunGard Availability Services business continuity center in Carlstadt as an emergency command post during Hurricane Sandy.

Like most other residents of the East Coast, I had been following the news about the approach of Hurricane Sandy vigilantly. As a New Jersey resident, my interest was even greater, as the storm the media dubbed “Frankenstorm” was tracking to make landfall on the evening of October 29 over the New Jersey shoreline and proceed inland.

When it became more likely that Hurricane Sandy would be as destructive as many experts were predicting, the storm was also becoming a concern from a professional standpoint.  As the technical service delivery manager for the Northeast region for SunGard Availability Services, it’s my job to make sure that our facilities in Carlstadt, N.J. – a town about 15 miles west of Manhattan—are operational for our customers during any crisis.

A week prior to the hurricane’s arrival, SunGard activated its three-stage hurricane preparedness process. As part of the process, we carefully followed tested procedures to help keep our employees safe and our customer data secure, our facilities secure and our communications consistent. Along with personnel at other data centers that could be affected by the natural disaster, our on-site facilities team verified that all environmental and electrical gear was in full working order before the storm.

We felt well prepared in Carlstadt despite the fact that New Jersey would face the full power of Hurricane Sandy.

On Monday, October 29, the weather worsened throughout the day. At about 9:30 p.m., I started the one-hour drive from my home to Carlstadt. As I found out later, I was the last person to drive through the local area just before the hurricane hit.

After navigating several detours, I arrived at SunGard’s mega center in Carlstadt—home to two data centers and a business continuity site, which provides customers with a fully functional alternate work space for employees to use while in disaster recovery.

At our Carlstadt data centers, we provide advanced recovery, testing, advanced replication and hosting for customers. That night, my colleagues and I were working furiously to assist customers. Some customers initiated an orderly process of shutting down their equipment, and we were able to control the situation so there was no customer impact due to data center issues.

We also had a number of customers at our facilities and we communicated with them personally and kept them updated throughout the evening. Additionally, there were multiple notifications from our Service Desk and direct phone calls to customers.

As this was happening, at around 11:45 p.m., we had some unexpected visitors. The fire chief of a small nearby town, Moonachie, arrived in his SUV with three ladder trucks, two ambulance squad trucks, and a police cruiser in tow. Moonachie was being overrun with floodwaters from a storm surge caused by Hurricane Sandy, and the officials said they needed refuge and shelter for their own operations, and also for citizens that would be rescued throughout the night.

They asked if SunGard would open its business continuity site for this purpose, and I immediately said yes.

Within minutes, the fire chief had pulled his SUV to the front of the building, opened the back hatch and began using the area to respond to 911 calls and direct emergency operations in the field. Soon after, more emergency responders and the mayor of Moonachie, Dennis Vaccaro, arrived at the business continuity site, and the area became the command post for the duration of the night.

Those that were rescued from their flooded homes, and in some cases from the roofs of their cars, were taken to the SunGard business continuity center. Sheltered and comforted with sheets and blankets, they remained in safety while the hurricane and flooding lashed Moonachie.

In total, our facility provided shelter for approximately 60 residents rescued from danger, and 40 fire, rescue and police.

The Carlstadt facilities remained dry and operational throughout the storm, and I was extremely proud that we were able to assist the community in a small but useful way during Hurricane Sandy.

Fireproof Your #DisasterRecovery Plans, Because Life is Like a Box of Chocolates

By: Nora Hahn, Sr. Marketing Communications Manager, SunGard Availability Services

Last year, Texas was undergoing its worst drought on record.  Scorching temperatures and seven months without rain was wreaking havoc on the state.  But Labor Day weekend was in sight, and my family couldn’t wait to take a little holiday in the Texas hill country just outside of Houston in the small artsy town known as Round Top.

We’d rented a cottage big enough for the grandparents, kids and grandkids, complete with a pool, a couple of horses and one giant Longhorn steer.  Along the way my sister stopped off in Bastrop, Texas – a nearby German community – at an authentic European chocolate shop.  She purchased a box of hand-crafted German chocolates that danced on your tongue and reminded your taste buds what heaven must be like.  We savored these special treats every night after dinner and coffee amidst the cool breezes and cicada symphonies.

This little chocolate shop was known throughout the state as the real thing – real chocolate made by real Germans, based on old country recipes.  Anyone traveling between Houston and Austin knew this was the place to go for a sweet treat that couldn’t be found anywhere else.

A couple of days into the trip, we received a jarring phone call at ten o’clock one night: Wildfires were spreading throughout the hill country, and we were to stay alert for possible evacuation notices.  Thankfully, we never got a second call.  But the next day we learned that the little chocolate factory had burned to the ground.  The place was annihilated; everything was lost – every spoon, every ounce of chocolate, every piece of special candy-making equipment from Europe.  The only thing saved was the owner’s special recipe book and around $200 from the cash register.

To this day, the chocolate shop is still closed.  The owner posts regular updates on his website, but the chocolates are a distant sweet memory.

What’s a small business to do in a situation like this?  Is any business too small to have a back-up plan?  How do you prepare for a disaster that comes out of nowhere?

In today’s technology-dependent world, companies of all sizes have to have a business continuity plan.  Not having a plan for retrieving your business files or connecting with employees, suppliers and customers is deadly.  I was reminded of this in reading SunGard’s white paper “Five Reasons Why Disaster Recovery Plans Fail.”  The little German chocolate shop had no way of contacting its customers or even its business partners.  The owner was left to using a PC and internet connection provided by his hotel.

First things first – personal safety and rebuilding physical structures matter most.  But staying connected to customers, business partners and colleagues is the next step.  The wildfires in Colorado this summer are a stark reminder of the dangers imminent in our unpredictable weather patterns.

In short, your business is never too small to have a disaster recovery plan.  Because as Forrest Gump once said, life is like a box of chocolates: you never know what you’re gonna get.

Learn more about Disaster Testing in this month’s edition of the INSIDER.

Thinking About #DR to the #Cloud? Vendor Selection is Critical.

By Michael de la Torre, Vice President of Product Management, Recovery Services

DR in the CloudIn a recent session at VMworld, the question was asked: “Have you looked at DR to the cloud?” Of the 50 people in the room, more than half said they had not looked at it, but thought it sounded intriguing (55%). Another 10% responded, “Yes, it’s fantastic!” This seems to jive with a Forrester report I recently read that more than two-thirds of all IT professionals were actively implementing or interested in implementing a cloud-based solution for disaster recovery.

People who tout the cloud usually point to the benefits of reducing the capex and opex costs associated with buying and maintaining servers, networking, and storage elements. But with so much hype around DR to the cloud, I wonder if people are able to cut through the buzz and understand the critical truth: that when used for recovery efforts, disaster recovery in the cloud can definitely help reduce recovery times and lower the cost of managing recovery operations, but only if the right service is chosen.

Here are some of the things you’ve got to consider when deciding on a Recovery-as-a-service cloud vendor:

Service Level Agreements (SLAs): Beware the vendor who only offers you a single Recovery Time Objective (RTO)/Recovery Point Objective (RPO). With only one RTO for your entire environment, you might be spending too much or too little on DR. This is because you should almost always tier your applications into three buckets of importance: mission-critical, business-critical, and best-efforts. Each of those buckets should have its own unique RTO – usually with the mission-critical applications needing recovery times of under 4 hours, business-critical applications requiring recovery times of under 12 hours, and best-efforts applications taking 24 hours or more.

Some examples:

  • If your vendor offers you a 12 hour RTO, then your mission-critical applications will not have the appropriate level of availability, and you would be under-spending.
  • Conversely, if your vendor offers you a 4-hour RTO for all your applications, you would be spending too much on keeping those non-mission-critical applications available.

Ideally, your cloud vendor offers you a range of RTO/RPO options.

A Compatible Recovery Environment: Chances are, your applications are running on a mix of physical and virtual server platforms, hypervisors, operating systems, and storage. To further complicate things, your applications are likely interdependent – meaning that they rely on other applications in order to deliver a complete business process. As John Donne said “no man is an island”, the same can be said of applications… no application is an island.  Your recovery environment should therefore reflect your production environment and provide the same mix of platforms, hypervisors, operating systems, etc., so that you can bring up your applications in a consistent way, according to the interdependencies you’ve identified. If your recovery cloud vendor operates an environment that is too homogeneous, then you’re likely going to fail when you go to restore applications that do run on hybrid physical-virtual stacks.

Managed Services:  Data protection is not disaster recovery!  If you think, “I’m backing up my data to the cloud, so I’m covered for DR,” then you’re in for a nasty surprise if you should ever experience the need to recover.  All you’ve done is protect your data, which is necessary, but not sufficient. During an actual disaster, you’ll also need someone to spin up the servers and networking and storage equipment to perform the actual recovery too, so it’s preferable to have those co-located with your data.

On top of that, you’ll also need recovery runbooks with the right processes, and the right people with the right expertise to recover your applications. So, the ideal cloud-based recovery service provider should also give you the option of leveraging their expertise to help plan and execute your recovery operations. And since DR test planning and execution take up much IT staff time and budget, it would also be smart to look for a vendor whose staff can take over that function as well. I’m not saying you HAVE to take advantage of this added service…just that it would be nice for you to have the added flexibility and the option to do so.

What are YOUR thoughts on using the cloud for disaster recovery? Are their factors to consider that I haven’t mentioned? I look forward to your comments below.

Download a copy of the Forrester Research Report, titled “An Infrastructure and Operations Pro’s Guide to Cloud-based Disaster Recovery Services.” 

#VMworld Recap: Lessons Learned and Insights Gleaned by a VMworld Rookie

By

VMWorld 2012My first VMworld ever is over and I am zombie-tired. The four days of driving to San Francisco’s Moscone Center and back (40 miles each way), plus the constant scramble between the exhibit hall and the education hall, plus the effort to cram all kinds of VMware knowledge into my wee little brain, have me plum tuckered out. But, as depleted as I am, I’m also a lot smarter from the experience.

I attended sessions on how CIOs can run IT like a business, on cloud-aware security, and on running mission-critical applications in the cloud. But I think the most valuable session for me was one I hadn’t planned on attending. Yup, on my first day there, I was just wandering around, trying to get my bearings, when I happened upon a session titled, quite simply, “Disaster Recovery.” So I trusted to serendipity, stumbled in, plunked down in an open seat, and started listening.

It was a discussion group wherein the leader, VMware’s Ken Werneburg, asked a series of questions to which the participants could respond by hitting a letter on a hand-held remote device. As we responded, our answers were aggregated in real-time and flashed up on the screen in front. Here are some of the nuggets I got out of that session:

  • 81% of the people in the room (about 50 people) operate a production environment that is 51-10% virtualized.
  • Only 6% had ever gone through a full failover, while 18% have had to fail over a portion of their environment, 30% have tested DR but never gone through a true DR scenario, another 30% had never had to do any DR whatsoever (including testing), while the remaining 16% had an outage and could not recover at all.
  • 55% of the people in the room have not looked at DR in the cloud, but think it sounds intriguing. 15% had looked at it but written it off, 10% think it’s fantastic, and the remaining 20% are not interested at all.
  • 50% of the people in the room want to test DR twice a year, while 30% wanted to test quarterly. The remaining 20% felt that once a year was more than enough.
  • 40% of the people in the room had performed a Business Impact Analysis and arrived at a cost per hour of downtime; 43% did not have the time, money, or people to do this, and the remaining 17% had never even heard of it.

As for SunGard’s participation in VMworld sessions, our VP of Recovery Services Product Management, Michael de la Torre, co-presented a session with VMware called, “DR to the Cloud: A Service Provider’s Perspective.” The room was packed as Michael highlighted the six factors that affect the design of your company’s DR plan (RTO/RPO, size of environment, complexity, performance, regulations, and program management). In particular, Michael also talked about the difficulty of recovering a hybrid physical and virtual environment on a do-it-yourself basis, an issue that no one is really thinking about and that we at SunGard have brought to the forefront of late. (Check out our videos below that explain the primary challenges of recovering complex hybrid environments).

Because no one has really talked about these challenges before – and THAT’s because the industry has not really thought about what happens when you go to recover production environments that are increasingly heterogeneous mixes of applications, platforms, databases, hypervisors, and storage – it has felt a little like we’ve been pointing out that the Emperor isn’t wearing any clothes. Especially since the prevailing wisdom is, “Virtualization makes disaster recovery ridiculously easy.” So I felt particularly validated when someone asked the question of VMware, “Will you be releasing any products that also address the recovery of physical servers?” To which the response was, effectively, “No…just get as virtualized as possible so you don’t have to deal with that.” Hmm. I’m not sure we’re ever going to get to a 100% virtualized world, quite honestly, and until then, virtualization will continue to bring another layer of complexity to the recovery problem.

In the meantime, I’ll look forward to hearing your comments on what you learned at VMworld, and to attending again next year.

Part 3: #Virtualization Makes #DR Easier, Except When it Makes it Harder

By Ram Shanmugam, Sr. Director of Product Management

Part 1 of this blog series described the things that change and don’t change when recovering hybrid environments (part physical, part virtual). Part 2 of this series talked about application tiering and data movement, two things that don’t change even when the environment is hybrid. In today’s Part 3, I’m going to get on my soapbox about the three main challenges of recovering hybrid environments.

The 3 Challenges are:

  1. The need to recreate a multi-layer, multi-platform hybrid stack for each and every mission-critical application.
  2. The need to do point #1 above within a certain recovery time objective (RTO).
  3. The need to spend the capex for a second site (both hardware and software), and the opex to maintain it.

Let’s take a typical example of a 3-tier web application, say, an e-commerce application. The application may have a database layer that is on two different systems: a Linux system running Oracle and a Windows server running SQL. Next, examine the middleware, or business logic, layer of that application with it on a Win2K server running WebLogic, and its job is to aggregate data from the Oracle and SQL servers. And finally, the web layer is on an ESX server running Apache.  To make things more complicated in this scenario (and therefore realistic), the web and middleware tiers are stored on an EMC SAN device, the Oracle database is on a NetApp SAN device, and the SQL server is on yet another storage vendor’s device (say, a Dell device).

In this scenario, you have multiple storage platforms, multiple compute platforms, multiple operating systems, and a mix of physical and virtual environments. Sound familiar? Now, let’s say something goes wrong, and you need to recover this application at your recovery site. News flash: your recovery is going to fail if you haven’t created the identical physical and virtual stacks in your recovery environment to accommodate all three layers. If you have the wrong version of VMware’s hypervisor running in the recovery environment, you’re dead in the water. If you have the wrong hypervisor running in the recovery environment (say, Xen), you’re dead in the water. If you have only the ability to recover the database layer by itself, or both the database and middleware layers without the web layer, you’re dead in the water. Or vice versa – getting the web layer back without the other two layers also leaves you up a creek.

And that’s just ONE app. What if you have 50, 80, or 100+ apps to recover? Now, compound this problem with the problem of having to recover all of these apps within a certain RTO, and you’re starting to get the picture of the magnitude of the challenges presented by hybrid environments. In a word: elephantine.

In order to support the recovery of a hybrid environment, you need to have the correct infrastructure in place: the right recovery technologies for each platform and O/S in your secondary site, the right expertise or staff (an Oracle person, a Windows person, a storage person, a VMware person), and a well-documented disaster recovery blueprint (or runbook) that contains all of your recovery processes.

Moreover, in order to have that runbook be current, you need to make sure that any changes in production configurations make their way into the recovery environment (change management). And putting all of this in place could cost a big bundle.

Do these challenges sound familiar to you? If so, how are you addressing them today? I’d love to hear your feedback, as well as any insights into what you’re doing to keep your hybrid environments available.

We’ve just published a white paper containing a more fleshed-out version of SunGard’s suggested approach to recovering complex hybrid IT environments, if you’d like additional information.