Posts Tagged ‘hybrid cloud’

Part 3: #Virtualization Makes #DR Easier, Except When it Makes it Harder

By Ram Shanmugam, Sr. Director of Product Management

Part 1 of this blog series described the things that change and don’t change when recovering hybrid environments (part physical, part virtual). Part 2 of this series talked about application tiering and data movement, two things that don’t change even when the environment is hybrid. In today’s Part 3, I’m going to get on my soapbox about the three main challenges of recovering hybrid environments.

The 3 Challenges are:

  1. The need to recreate a multi-layer, multi-platform hybrid stack for each and every mission-critical application.
  2. The need to do point #1 above within a certain recovery time objective (RTO).
  3. The need to spend the capex for a second site (both hardware and software), and the opex to maintain it.

Let’s take a typical example of a 3-tier web application, say, an e-commerce application. The application may have a database layer that is on two different systems: a Linux system running Oracle and a Windows server running SQL. Next, examine the middleware, or business logic, layer of that application with it on a Win2K server running WebLogic, and its job is to aggregate data from the Oracle and SQL servers. And finally, the web layer is on an ESX server running Apache.  To make things more complicated in this scenario (and therefore realistic), the web and middleware tiers are stored on an EMC SAN device, the Oracle database is on a NetApp SAN device, and the SQL server is on yet another storage vendor’s device (say, a Dell device).

In this scenario, you have multiple storage platforms, multiple compute platforms, multiple operating systems, and a mix of physical and virtual environments. Sound familiar? Now, let’s say something goes wrong, and you need to recover this application at your recovery site. News flash: your recovery is going to fail if you haven’t created the identical physical and virtual stacks in your recovery environment to accommodate all three layers. If you have the wrong version of VMware’s hypervisor running in the recovery environment, you’re dead in the water. If you have the wrong hypervisor running in the recovery environment (say, Xen), you’re dead in the water. If you have only the ability to recover the database layer by itself, or both the database and middleware layers without the web layer, you’re dead in the water. Or vice versa – getting the web layer back without the other two layers also leaves you up a creek.

And that’s just ONE app. What if you have 50, 80, or 100+ apps to recover? Now, compound this problem with the problem of having to recover all of these apps within a certain RTO, and you’re starting to get the picture of the magnitude of the challenges presented by hybrid environments. In a word: elephantine.

In order to support the recovery of a hybrid environment, you need to have the correct infrastructure in place: the right recovery technologies for each platform and O/S in your secondary site, the right expertise or staff (an Oracle person, a Windows person, a storage person, a VMware person), and a well-documented disaster recovery blueprint (or runbook) that contains all of your recovery processes.

Moreover, in order to have that runbook be current, you need to make sure that any changes in production configurations make their way into the recovery environment (change management). And putting all of this in place could cost a big bundle.

Do these challenges sound familiar to you? If so, how are you addressing them today? I’d love to hear your feedback, as well as any insights into what you’re doing to keep your hybrid environments available.

We’ve just published a white paper containing a more fleshed-out version of SunGard’s suggested approach to recovering complex hybrid IT environments, if you’d like additional information.

Part 2: #Virtualization Makes #DR Easier, Except When It Makes It Harder

By Ram Shanmugam, Sr. Director of Product Management

VirtualizationSo let’s talk about application tiering first. Virtualization does not change the need to perform a business impact analysis that helps you understand the cost of downtime application by application. At the end of this process, you should have a list of applications prioritized by the size of their impact to revenue or to costs (some applications, if down for too long, can actually start incurring penalties for your company). Following best practices, you would then assign a recovery time objective (RTO) and recovery point objective (RPO) to each of these applications. So far so good, right?

Next, you need to move your data over to your secondary site via a “data mover.” Data movers, as we like to call them here at SunGard, are pretty much exactly what they sound like: the technology for moving data from one site to another. The slowest form of data movement, of course, is to put all your data on tapes and send them on trucks over to your secondary site for vaulting. However, for applications that require faster recovery, a number of technologies and choices are better.

At SunGard, we recommend selecting the data mover based upon the RPO of the data you are moving. Our reasoning behind this is that data movers vary in cost, so you would want to match technology to the data being moved, based on the value of the information.

If you’ve done the Business Impact Analysis that I mentioned above, you’ll have assigned the data supporting your applications to one of 4 tiers of RTO:

  • Tier 1: < 4 hours RTO
  • Tier 2: 4 – 12 hours RTO
  • Tier 3: 12 – 24 hours RTO
  • Tier 4: 24+ hours RTO

Now, you have 4 broad categories of data mover to select from:

  • Server- or host-based replication: This uses asynchronous server replication technology to deliver recovery at sub-4 hour RTOs.
  • SAN-based replication: This is where you use the storage replication technology of your choice to replicate data from production to recovery environment, with the aim of recovering large-scale virtual applications environments at sub-12 hour recovery points.
  • SAN-based vaulting or snapshot: Your primary site data goes into an online vault. Typical recovery point objectives are within 24 hours.
  • Online or disk-based backup: The application data is backed up using backup software onto a disk (or even a tape). The RTO is 12-48 hours, with the RPO depending upon backup frequency and windows.

As you can see, the choices above increase in RTO and RPO tiers. In other words, for Tier 1 data, it’s best to use server- or host-based replication as your data mover. For Tier 2 data, it’s best to use SAN-based replication. And so on and so forth.  This way, you are aligning your data movement technology with the business value of your data.

But as I’ll discuss in my next post, simply replicating your data at a second site does not buy you a disaster recovery plan! Tomorrow, I’ll talk about the key challenges in recovering applications running in a hybrid environment.

Read on, Part 3: Virtualization Makes DR Easier, Except When It Makes It Harder

 

#Virtualization Makes #DR Easier, Except When it Makes it Harder

By Ram Shanmugam, Sr. Director of Product Management

VirtualizationUnless you’ve been living under a rock for the last half-decade, you know that virtualization is changing the landscape of IT and data centers.  In terms of financial impact, virtualization untethers applications from physical servers, creating valuable savings. In terms of disaster recovery impact, virtualization makes recovering applications easier – MUCH easier. It’s as easy as copying a file to a computer and running it.  Here’s the kicker: the world is not 100% virtualized yet. Data centers are becoming increasingly virtualized, but most data centers today are still some part physical and some part virtual. That is to say, they are “hybrid” environments (to support my point, Gartner told us in a recent inquiry that they estimate 50% of all workloads today to be running on virtual machines).[1] That means, 50% are not.

While newer applications are being run on exclusively virtual workloads, there are still plenty of mission-critical apps running on a combination of mainframes, Windows servers, Linux/Unix systems, and virtual machines. Given this scenario of a hybrid production environment, the challenge for CIOs becomes: “How do you best protect and recover applications within a hybrid infrastructure within certain recovery time objectives (RTOs) and recovery point objectives (RPOs)?” Or, in other words, “How do you think about Disaster Recovery in this new semi-virtualized world?”

Well, here’s my short answer: as long as we are living in this hybrid world, virtualization is an added layer of complexity that requires some adjustments to your recovery strategy and infrastructure. Most DR fundamental principles don’t change, but a few tweaks are required. I will elaborate upon these in this blog and in two more blog posts to come.

What Doesn’t Change

  1. Application tiering. Applications still need to be tiered according to their respective cost of downtime. You should still assign an RTO and RPO to each application based on its overall impact to your business.
  2. You still need to move your data from your production environment into a recovery environment (some might call this a “DR site” or “secondary site.”) How you choose to move the data is dependent upon the RTO and RPO that you assigned above.
  3. You still need to ensure compatibility between production and recovery environments. After all, if you let the infrastructures and technologies between the two sites diverge too much, how can you use one to recover the other?

What Needs Tweaking

Since your primary site is now a hodgepodge of physical and virtual (meaning multiple applications running on multiple platforms, multiple hypervisors, and multiple storage), you should expect that your recovery site will be the same as well. If you’re doing DR yourself (we call this the “self-insured” model), then you’ll need to ensure the total compatibility of your physical and virtual compute layers between your primary and secondary sites. The “tweak” I am referring to is the addition of the “virtual” layer, with all of its attendant hardware, software, and people/expertise.

I’ll be back later this week to spell out more about each point above. Stay tuned!

Read on, Part 2: Virtualization Makes DR Easier, Except When It Makes It Harder



[1] *Gartner, Inc., Top Five Trends for x86 Server Virtualization, Thomas J. Bittman, March 22, 2012.

Cloud Connect 2011

Satish Hemachandran just returned from Cloud Connect 2011

This week’s Cloud Connect 2011 was the place to be to discuss all things Cloud. I spent two days at a packed convention center where the session topics conveyed the attendees’ interest in deciphering the challenges faced by enterprises in Cloud adoption. The consistent theme for this year’s event was about how Cloud for the enterprise needs to be built with availability, manageability, and security in mind – an area that we within SunGard are most passionate about. 

I had the opportunity to present SunGard’s vision of the Enterprise Cloud on Tuesday – this session was focused on the risks that IT departments face as they embark on the Cloud path and how these perceived and actual risks can be addressed through systematic mitigation. This risk mitigation takes the form of both products and processes that need to follow industry best practices but fine tuned for the Cloud based on your specific enterprise requirements.

The majority of enterprise customers though, are unable to solve this problem on their own since they are faced with diminished IT budgets, personnel resource constraints, or a lack of suitable Cloud technology vendors who offer these capabilities out of the box.  For instance, one of the people I spoke to at Cloud Connect was looking to introduce Cloud to his enterprise but was needed a partner who could not only understand his business and technical challenges, but was ready to address them. Specifically, as a large consumer company, he had data security and governance requirements that none of the commodity Clouds offered or even had thought about.

Another attendee was looking to build a hybrid Cloud that would allow his company to connect an IaaS with a tiered storage service with the kind of bandwidth and SLAs he needed while maintaining security. We also had a number of businesses ask about how change control took place in an enterprise Cloud and if/how Enterprise Cloud could help with meeting compliance requirements.  These questions are what you would expect any enterprise to have before committing to adopt a major technology shift.  

At SunGard, we believed that a Cloud done right can indeed offer the benefits of cost optimization and flexibility along with all characteristics around security, monitoring, management, integration/connectivity that makes it enterprise ready…it was good to hear these same sentiments expressed over and over again at Cloud Connect.

What did you learn at Cloud Connect?