Archive for September, 2010

SMB and Enterprise Differences: Security, Risk and Continuity

Earlier this month, Forrester Research published the results of a survey that highlighted top priorities for IT decision makers. Improving BC/DR was the top priority for SMB organizations, and was #2 for large enterprises.  This made me consider: when preparing for security, risk and continuity, what are the differences that organizational size makes ?  I invited three SunGard consultants to share their thoughts on each area.  Their responses are summarized in this table for you:

Security Viewpoint (Chris Burgher, CISSP, PMP, CISA – Associate Principal, Security)

“With information security, the main risk factors are the same when considering organization sizem and include compliance, brand & reputation issues if breached, and costs of losing data.  Large enterprises generally will have multiple compliance requirements such as GLBA/HIPAA/SOX/PCI.FFIEC, while the SMB may have only one or two areas for regulations.  An interesting dimension is that the large enterprise may also have an increased risk of insider attacks, simply because they are dealing with a larger employee population.”

Risk Viewpoint: (Mike Shandrowski, BC/DR Architect)

“The differences seem to be more from the perspective on how they address risk mitigation and not necessarily around the need for risk mitigation. From what I see, SMB clients tend to “self analyze” their risks more often, resulting in more of a Risk Analysis than a full Risk Assessment.  For the enterprise, their Assessment will include an evaluation and statement of judgment on what the risks mean to their organization, the interrelationship between risks, and what to do next with that risk information.”

Continuity Viewpoint: (Bill Hughes, CBCP – Director, BC/DR Center of Excellence)

“The organization size – whether small or large – has both advantages and disadvantages related to their scale. SMB clients tend to say they “know” how things should work and how they should respond; in many cases, that is true because the close interactions across the organization means that knowledge hasn’t been segmented or isolated. Because they may lack certain critical mass, I will often find more single points of failure in this size organization, however.  With that, you need to be looking at how the intellectual capital is managed and maintained in the organization. Those who know are often the busiest people, leading to a challenge during a crisis when they cannot effectively work on parallel activities to restore normal operations while directing a recovery effort off-site.”

Security, risk management and effective business continuity are closely linked, and reflect differences due to organization size; please share your insight in comments with how scale has effected your own organization.

What Happened to Inventory Safety Stock? Some Thoughts on Supply Chain Resilience

Recalling a brief discussion with Dr. Yossi Sheffi a few years ago following the publication of his book, The Resilient Enterprise, I am reminded of his description that every industry has a supply chain dynamic. For years, a standard measurement of supply chain management was the safety stock level. Of late, efficient supply chain management looks to reduce that level to approaching zero for “just-in-time” management. And it is working. A review of average sales figures compared to inventory levels between 1999 and 2009 (last full-year data, U.S. Census Bureau) shows sales increases of 28% while inventories only increased 24%. Looking back to 2007 (pre-economic turmoil) is even more dramatic: Sales grew by 43%, while inventory levels only grew by 31%. Had inventory ratios kept pace, it would have required nearly $135 billion in additional capital investment.

The reduced capital investment in inventory levels has the effect of freeing up investments in the growth of businesses – fueling sales growth.

Just-in-time Inventory Risk

On the risk side, zero safety stock is, in a word, “unsafe.” Just-in-time inventory management depends heavily on automated systems, and managing the increased process risk calls for architecting resilient systems. An outage when I had a 1-week safety stock level allowed for an RTO of a few days. An outage with a just-in-time model demands recovery in minutes or hours.

Architecting for Continuous Availability

I had the chance to participate in a web event hosted by DRJ with Forrester analyst Rachel Dines today. Rachel pointed to the differences between disaster recovery and the ITIL concept of “IT Service Continuity Management.”  This increased focus on continuous availability – looking at any and all likely business disruptions – is a good way to examine whether your continuity program will be effective in protecting just-in-time inventory with appropriate protection for both planned and unplanned downtime.

The webinar event from DRJ is available for replay; take the time to consider the technology options for an “always-on” organization (slides 21-26). From remote backup through asynchronous and synchronous server replication, options exist to improve operational availability. While advanced options may have an increased cost, be sure to balance that with the just-in-time capital savings; protecting continuous availability is a necessary – and cost effective – business decision.

Let me hear your thoughts on supply chain risks and how your organization plans for resilience.

Q&A with: Bob DiLossi – Director of Crisis Management

Bob DiLossi is the Director of the SunGard Availability Crisis Management Center, having managed this area for the past seven years. In that time, Bob has been directly involved in hundreds of disaster exercises and actual declarations. As we recognize September as National Preparedness Month, pass this ninth anniversary of the September 11th tragedy, and anticipating the DRJ Fall World Conference (September 19-22, 2010), I had the chance to speak with Bob and get his perspective on crisis management today.

Q: Can you tell me what has changed in recent years from what you are seeing with customers?

A: Customers today practice additional scenarios, and to a greater depth of detail, than they have in the past. These scenarios reflect more of the everyday events, which lead to more realistic and more robust validation of their continuity programs. I see significantly more blending of the data center recovery process with the business processes, as evidenced by the increased number of mock disasters we have seen in the past few months, tying customers’ internal table top exercises with the SunGard Crisis Management Center.

Q: Bob, you’ve participated in literally hundreds of disasters and exercises; what do people forget most often that would help them become more effective and successful?

A: In the past, I would say that they had neglected the people aspect, the detailed processes that surround the end-user recovery. That trend, fortunately, has changed of late, perhaps driven by a greater awareness of the staff impact that has been seen in the news following events like Hurricane Katrina. The biggest challenge now is change management. There continues to be a disconnect when an organization deploys new technology. Too often, we see technology changes that support daily production workloads not reflected in recovery plans.

Q: Some customers are more effective than others in their test success; what sets them apart?

A: Probably the single most important factor is how thoroughly they exercise their plans. We advise new customers to follow a “crawl-walk-run” approach to improving their plan, but some never progress past the “walking” – testing individual components but not all their applications and procedures as an integrated exercise. We’ve seen some customers back off of testing with the current economy, but the mature process and best practices deliver value only when you have verified that your plan will actually work within all the resource and time constraints you are tracking.

E-Commerce Failure and Operational Impact

Did you know that September is National Preparedness Month in the US? At SunGard we often speak with our customers about “Recovery Time Achievable”, or RTA. It’s fine – and necessary – to understand and define RTO and RPO, but each business needs to know what is achievable based on their plans and procedures, and whether a gap exists between RTO and RTA. The gap can severely contribute to the cost of downtown, and what better time than National Preparedness Month to look more closely at what your resilience program can achieve?

Outage Costs and Resilience

Type “e-commerce outage” into any search engine, and you get an intriguing list of stories. Some, however, also impact your physical locations. This past week, CSO Online posted an item about the July outage experienced by American Eagle Outfitters. For eight days, their online site was down, and although they had recovery plans, several issues arose that undermined their recovery efforts. Their RTA fell short of expectations.

This highlights the need to verify the effectiveness of recovery plans and procedures. When was the last time you exercised your full plan? How do you monitor and report on timeline verification? Did the recovery timeline achieved match your RTO/RPO expectations? If not, then you need to raise this discussion within your organization using business resiliency language.

American Eagle has a policy that if an item is not available in the store, they will locate it over their web site and offer it with free delivery to your home. Borders Books & Music has a similar policy which I have used, as I am certain so do many other retailers, sometimes determined by the purchase amount. Online commerce sites have become part of “business as usual” for many brick-and-mortar retailers.

Potential Market Share Impact

A web outage brings with it the lost opportunity for a sale not only online, but potentially in their stores. If I walk out of a store with my sought after book, CD, or jeans purchased and ready to be delivered to my home in two days, I won’t buy the item somewhere else. Take away that feature, and those who walk out of a store during a web outage are likely to complete their purchase either at another store, or online with a competitor. Most won’t even try your web site later; after all, the clerk in the store probably told me the site was down by way of explaining and apologizing for the inconvenience.

How do you measure the E-commerce outage impact to market share? What if I like the product from the second store better? What do you now need to do to get me back to your web site or retail location?

Let me know what you think about web site outages, how they may have impacted you, or your own story about changing purchase decisions due to an outage.

Electronic Medical Records, Quality of Care … And Resilience

EMR and Resilience

Have you considered how automation affects your own health care, particularly in a crisis or emergency situation?  Not too long ago, I sat in an emergency room late in the evening, with yet one more visit for one of my son’s sports’ injuries.  At midnight, there was an announcement over the public address system to hospital staff, advising them to go to “manual records” for the next two hours while they completed system backups

 Clearly, this message caused doctors, nurses and supporting health care technicians to change the way they provided patient care during those next few hours. Electronic Medical Records (EMR) would be unavailable, and at some point all the manually tracked details would need to get entered into the system.  Emergency rooms are designed to respond to the unexpected, and to deliver consistent, quality patient care at any time. Not having my son’s past records while waiting for an ankle x-ray wasn’t a big deal. Another patient, perhaps unresponsive or unconscious without supporting records could present itself as an entirely different trauma response situation.

 Washington Incentives

The current Washington funding support for electronic health records (EHR) starting at the family practitioner’s office makes sense for continuity of care. But such EHR systems are not without new risks. The integrity of any resulting system will demand that resilience be part of any deployed system and processes.

 Last month, Dr. David Blumenthal, MD (July 13, 2010) National Coordinator for Health IT commented on the progress being made, and the challenging work ahead.

 What Dr. Blumenthal didn’t really discuss were some of the adoption risks. A couple of anecdotal comments from health care professionals help us focus on these issues:

  • One general practitioner mentioned that when paper records get misfiled, they can typically be located within 5-10 minutes, because there are a few common mistakes that guide where to look, along with the simple numerical and color-coded files. But if an automated system is down, they have no records – for anyone. Not for those in the office, nor for the ER physician calling about an unconscious patient brought in by ambulance.
  • The mobile devices intended for health care workers demand a level of performance that might not be expected.  As one nurse put it to me: “I need my hands free to work with my patients. If someone is in a crisis, I’ll simply drop the laptop and respond.”

 These examples reflect process issues that are integrated with the delivery of quality health care, and each is independent of any specific automation component. 

 Let me hear your thoughts on how healthcare IT can help ensure that the emerging EHR solutions live up to the resilience demands we expect.

The Business of Resilience – Introduction

The Business of Resilience
The world of business continuity and disaster recovery has changed these past few years to a world of resilience. This blog shares my thoughts and reactions to technology and business trends, to news and events that raise questions – for me at least – about where managers, business leaders and tech professionals are guiding resilience forward.

Whose Idea Is This, Anyway?
SunGard Availability Services has a long tradition in the continuity space. As the first commercial provider of recovery services for mainframe customers, we’ve had the benefit of guiding and participating in the changes that continuity professionals use as best practices, along with the many talented experts within our customer community.

For more than 20 years at SunGard Availability Services, I have watched this BC/DR industry grow and mature. With my background in software development and technology operations in the healthcare, financial services and supply chain industries, I find it interesting to watch how the changing perspective on resilience has influenced different business segments. I hope to use this blog as a platform to exchange ideas and perspectives on this changing landscape.