Posts Tagged ‘resilience’
Posted: February 22, 2013, 9:39 am
By Bob DiLossi
The 2012 hurricane season has thankfully come to an end and now is the time for businesses to prepare for winter storms. As some parts of North America have been experiencing a milder winter, winter storms can still and will occur – take winter storm Nemo that plagued the Northeast in early February for example.
On average, the United States has roughly four catastrophic winter storms annually with storms occurring most commonly in the northeastern United States. Being prepared is key, in some ways, winter storms can be the most challenging weather systems because they spawn so many types of emergencies.
Blizzards, electrical storms, hail, high winds, ice, sleet, and snow can contribute to communications failures, power outages, and risks to your buildings. Storms also lead to many driving accidents and you can lose critical personnel to injuries from slips and falls.
You need to prepare for all events that may occur, from damage to buildings to your business to your people. All three need to be part of the business continuity plan and part of the testing of your plan. As companies strive to meet the demand for continuous service, they expect 24/7/365 availability. However, the average organization’s requirement for recovery time objective (RTO) from an outage now ranges between two and 24 hours.
To help better protect your organization from the impact of winter storms, below you will find a checklist to gauge where you stand on preparing for winter storms. As you read the list, consider the impact each of the items would have, if they occurred, on your operations.
Building:
- Building managers unable to get to the building to assess and mitigate damage
- Communications infrastructure failures
- Explosions
- Freezing and flooding of interior building areas that may result in ceilings collapse
- Gutter clogging with ice dams, leading to leaks
- Hazardous material accidents
- Power outages, causing building environmental controls to shut down
- Roof damage or collapse due to ice, snow, or fallen trees
- Structural damage or collapse
- Transportation accidents or closed roads that trap people in or out of your building
People:
- Communications issues
- Employee safety
- Lack of corporate presence during recovery
- Lack of lodging/logistics
- Not focused on recoveries
- Team players not available to travel
When it comes to the business itself, you need to consider a winter storm’s influence on several areas of operation. Run through this checklist and determine how you would satisfy these conditions if problems arose:
Business:
- Customers expect supplies and services to continue—or resume rapidly
- Employees expect both their lives and livelihoods to be protected
- Insurance companies expect due care to be exercised
- Regulatory agencies expect their requirements to be met, regardless of circumstances
- Shareholders expect management control to remain operational
- Suppliers expect their revenue streams to continue
After going through the checklists and developing ways to address all of these items, you then need a plan of action to use once a disaster strikes. To that end, there are three major steps to begin the process of managing the incident:
- Mobilize a central command center, activate a business recovery plan and identify exactly how long the organization will operate in a recovery state, and plan accordingly.
- Following-up closely is the need for your organization to carefully document your processes, both in terms of how to recover and how to operate.
- You also need to practice and refine processes using a variety of scenarios.
To help with these preparations, a free business continuity toolkit is available from SunGard.
Recognizing the potential disruptive dangers from winter storms, in our next blog we discuss the importance of developing and practicing a suitable DR plan.
Tags: business continuity plan, disaster, Disaster Preparedness, disaster recovery, hurricane season, resilience, winter storm, workforce recovery, workforce resiliency
Posted in Business Continuity, Disaster Recovery
Add a Comment
Posted: December 20, 2011, 2:56 pm
Somehow, a perception exists that a cloud provides a certain level of redundancy by default. However, make no mistake. Redundancy is not inherent.
Admittedly, individual hardware and software components have some redundancy built in. However, those capabilities do not eliminate the need for a redundant cloud any more than safe cars eliminate the need for speed limits, traffic lights, divided highways and the rules-of-the-road.
For many cloud providers, especially consumer cloud providers, the only redundancy offered is to make physical copies of the data—and many customers do not use even that minimal level of recovery. These clouds were not built with redundancy in mind. They lack the automation, monitoring and procedures to provide clients with an environment that can anticipate, react and recover from component failures. Such clouds are cost effective only if your business, employees and/or customers can tolerate the occasional complete loss of service.
Redundant Redundancy
The hallmark of an enterprise clouds is the redundancy it offers. Redundancy exists throughout between the infrastructure layers to ensure high-availability. For example, a failover process detects application hangs and interruptions so corrective action takes place quicker. Monitoring tools ensure no single points of failure develops, and specially-built automation handles error conditions when a problem does occurs, obviating the need for human intervention. This type of automation is particularly important because human interaction comes only after some level of damage is evident.
Built-in Redundancy
It is cloud vendor’s responsibility to design and build redundancy into the cloud, and the expertise, staff, time and investment it requires is substantial. Patches and piecemeal solutions added over time do not render the same strong results as redundancy baked-in from the beginning.
Is recovery of stored data enough redundancy for your applications?
Download SunGard’s white paper, “The Real Value of Cloud Computing.”
Tags: automated cloud computing, built for redundancy, cloud computing, Cloud computing leader, cloud computing model, cloud computing risk, cloud redundancy, Enter your zip code here, enterprise cloud computing, monitoring cloud computing, resilience, secure cloud computing, top cloud computing
Posted in Uncategorized
Add a Comment
Posted: December 13, 2011, 12:58 pm
Business continuity focuses on the resiliency, restoration, disaster recovery and security needed to keep your system operating, performing, secure and, if an incident should occur, recoverable. Many cloud vendors have little experience with business continuity, preferring instead to offer consumer cloud services to clients that provide their own back-up procedures, intrusion protection, vulnerability alerts, firewalls, software upgrades and disaster recovery planning/testing.
Resiliency is the key
Without strong resiliency, redundancy and failover capabilities at each layer of the cloud stack, the failure of one component can cause the failure, in short order, of many subsequent processes. Some vendors have experienced such “cascading failures.” To be truly resilient, each component in the cloud must have failover logic and automation.
Enterprise Clouds are build for overall resiliency. That means they have not only failover capabilities and integrated, multi-site, storage locations but also multiple points “baked-in” where the system can failover in and between layers automatically. If a component fails, it needs to failover without human interaction, so the workload moves automatically to alternative hardware to maintain availability.
Ask the Tough Questions
If low-latency, high-performance, robust security and vigilant management are key requirements for your applications, it pays to drill your potential cloud provider about their procedures and automation related to resilience, redundancy, security, governance and data recovery. Ask for their Service Level Agreement early in your conversations, since it spells out the level of responsibility the provider expects to provide.
Does your current data center have automatic failover?
Read “Five Considerations When Evaluating Cloud Computing Architectures” for more information.
Tags: automated cloud computing, built for redundancy, cloud computing, Cloud computing leader, cloud computing model, cloud computing risk, cloud redundancy, enterprise cloud computing, monitoring cloud computing, resilience, secure cloud computing, top cloud computing
Posted in Uncategorized
Add a Comment
Posted: July 28, 2011, 8:51 am
Bob DiLossi is the Director of the SunGard Crisis Management Center. A long-time business continuity practitioner, Bob provides some commentary in this post concerning lessons gleaned from crisis management in the midst of severe weather events, such as hurricanes.
1. Does the Crisis Management Team do anything different once a hurricane has been named and a projected path is announced by the National Weather Service?
It’s important to recognize that we monitor all weather events, not just hurricanes in-season. What makes weather events unique is that sometimes, you have a warning period that allows for review of plans and preparation. Right now we are tracking a tropical storm over the Cayman Islands which may strike Texas or Louisiana, or may move in another direction. Our process is to consider potential storm direction, and contact customers who may be potentially affected. We begin by reviewing the human factors and anything that would affect the safety of employees and our ability to contact them during a crisis. Second, we discuss potential business impacts. We then put them on alert, not waiting for them to act. That act of placing them on alert often becomes an alarm for them to make sure they are taking the necessary precautions themselves. The SunGard portal then gives them visibility into our plans and status as the storm track develops.
2. What advice would you offer to SunGard customers as a storm approaches their location?
Of course, safety comes first. Immediately behind that is communications. We use our own NōtiFind product to manage calls and response tracking, as do many of our customers. Regional events such as storms can quickly become complicated from a communications perspective, both with the numbers of people to reach, and the failures in communications channels that a storm can cause. NōtiFind, integrated with LDRPS is how we manage the complexity. I also recommend that a customer never rely on just one means of communications. Land lines at home and work, cell phones, pagers and increasingly social media all serve to provide multiple channels to keep communications open.
3. Should customers do anything proactive with support vendors, such as maintenance vendors? What about with their trading partners?
Support vendors can be critical to both ongoing operations, and if needed during a recovery operation. My first critical suggestion involves fuel providers. You need a contract in place before an event, or else you become just another name on a list in the middle of the crisis. Second, review both your backup schedule and off-site transit of backups; depending on the anticipated timing of a storm, make sure that backup tapes do not sit on-site longer than necessary, and that they are stored hopefully outside the threat zone. Third, review any employee travel agreements. If you need to quickly send staff to a recovery facility, you may need to be sure that everything is in place to make that as easy as possible on your employees, such as planning the potential for emergency petty cash. In regard to 3rd party partners and vendors, make sure you involve them in tests and validation exercises, especially during “off hours”; responses at 10 AM on a Tuesday may be very different than 3 AM on a Sunday.
4. Is there anything different in the SunGard response to a hurricane when compared to, say, a fire or power outage?
The biggest difference is that with storms, you may get some warning. You might also get some warming with wild fires, such as we experienced this past year in the west. Most other events have no warning, and you are in a reactive mode. So, use the idea of “hurricane season” to do a periodic review of your plan, resources and capabilities. Hopefully you are not involved in a hurricane, but you will be better prepared for other unexpected events.
Tags: BC/DR, business continuity, continuity, crisis management, disaster recovery, Hurricane, resilience
Posted in Uncategorized
2 Comments
Posted: July 21, 2011, 11:56 am
Reading a Harvard Business Review Blog this week triggered this thought on resilience: when conducting any validation exercise, it is important to invite “outsiders” to participate.
John Baldini, writing for HBR, noted that management coaching involves having an outsider suggest ways to improve your perspective on reality and decision making, with the suggestion to invite others into routine meetings from outside the normal attendee list. It adds energy and creates some fresh dialog. Baldini writes: “A new perspective can allow a leader to make certain that what she sees is reality, not her perception of reality.” That statement applies equally well to resilience programs, too.
During more than twenty years in the continuity business, there are two observations that remain true even though the industry has shifted from “event-driven” to “resilience” planning. The first is that if you test the same components each time you validate your continuity plan, you really are not testing anything challenging. Ask yourself if your business changed during this same period, and the answer will always be “yes.”
The second observation is that while we generally “know” what our peers do during normal times, we are likely mistaken about who is responsible for what in the midst of a crisis. Mistakes here lead to decisions we will likely regret once the crisis is over.
Supporting a number of company-wide simulations over the past few years has proven this to be the case in virtually every type of organization, large and small, governmental and private sector. We make some basic – and often reasonable – assumptions about who makes decisions during a disaster, but it is critical that these assumptions be tested. Don’t assume Department X takes care of task 123; ask them, because they may be assuming that you are responsible for that task.
Better still, schedule an annual validation exercise that involves those outsiders. It has the dual value of increasing organizational training, while energizing the validation process. Assumptions during any crisis management activity often lead to lost time or mistaken actions.
Tags: BC/DR, business continuity, continuity, crisis management, disaster recovery, HBR Blog, resilience, risk management, testing
Posted in Uncategorized
Add a Comment
Posted: March 31, 2011, 10:15 am
With continuing concern surrounding the damaged nuclear plants, the global community continues to watch the turmoil unfolding in Japan. In the twenty days since the Sendai earthquake and the resulting tsunami brought unimagined devastation to the Japanese nation, we are seeing just how small planet earth really is.
Global Dependencies are Felt Locally
Moving beyond the destructive impact on whole communities and the human toll too quickly seems to trivialize the impact, but at the same time, it is important that organizations on a global level recognize our interdependence. These dependencies can be seen clearly in the examination of global supply chains. Companies such as Boeing, Sony, Caterpillar and John Deere have been referenced in the news as enterprises that are feeling the supply chain impact, or anticipating parts shortages within a very short time frame. General Motors has announced production impacts from Louisiana to Spain to Germany related to dwindling supplies of Japanese components.
Forrester Research mentioned yesterday that business continuity is “… back on the agenda …” for business executives. Today the Wall Street Journal reported that the disaster plan from Tokyo Electric Power was inadequate, especially for the combined impact or earthquake and tsunami.
Earlier this week, in a conversation with Gartner Research about testing recovery plans, the point was raised that more than just worst case scenarios, planning for the combination of events raises maturity to a best practice level.
While the Japanese continue their struggle to recover on a massive scale, much of the world has begun to consider “lessons learned.” We did this following the attacks of 9-11-01, following Hurricane Katrina, and similar action is demanded to review plans as to whether the assumptions made are grounded in the new reality unfolding in the news and within the lives of the Japanese people. Business processes and interdependency have become more reliant on automation, built around more complex trading partner and business models, and subject to more rapid impacts for disruptions due to “just-in-time” processes and inventory levels.
Lesson #1: Acknowledge Increased Risk Levels
My point today is simple: resilience and risk managers in organizations of every size must acknowledge the increased risk, and adjust plans accordingly. The lessons gained from examining events in Japan should stir internal reviews by every organization with trading partners concerning risks, logistics, capitalization, insurance and diversification.
For most of us, it is difficult to fully comprehend the impact on the ground in Japan. But all businesses need to examine how complex supply relationships – from raw materials to manufacturing capacity to transportation and selling channels – would be impacted from disruptive events that threaten such relationships. The imperative becomes determining appropriate mitigating actions and procedures in light of what we see in new light following the natural disasters in Japan and other global regions.
Tags: BC/DR, business continuity, continuity, Continuity Planning, crisis management, disaster recovery, Forrester Research, gartner, InformationWeek, Japanese Earthquake, Lessons Learned, resilience, risk, risk management, supply chain, Wall Street Journal
Posted in Uncategorized
Add a Comment
Posted: March 9, 2011, 5:39 pm
The more physically fit we are, the more resilient our muscles and bodies are to stress and strain. The same can be said for organizational resilience programs. They may need a “trainer” to help us get them in shape, but even without that expert resource, they certainly need regular exercises.
The risks companies face today are varied, and much like exercising different muscle groups, they call for different activities to examine and strengthen against these threats. In 2010, natural disasters had an estimated $109 billion impact, more than triple the previous year; that number doubles when you add the costs of man-made disasters, such as the Gulf Oil Spill, and we quickly see the cost justification for planning for worst case scenarios.
What Shape Is Your Resilience Program In Today?
Consider: data breaches become a violation of expectations of privacy by your employees or customers. When information is exposed to the outside world that should not have been revealed, both a technical and a communications response is needed; both factor into the estimated cost, which reached $214 per breached record in 2010 according to the Ponemon Institute. The same could be said for protected health information that needs to be kept confidential, and accessed only by authorized personnel. In a conversation this week with the president of a local hospital chain, she mentioned that they have dismissed employees over HIPAA rules violations. We operate in a world where transparency is demanded (SOX), and prohibited (HIPAA). Remaining resilient in the face of such risks calls for balance between privacy and authorized access in our highly connected world.
On another level, consider the recent WikiLeaks episodes. The public disclosure of confidential information gave a new meaning to transparency, and a caution to information security managers. I’ll not debate the layers of questions that these actions triggered concerning the breach of confidence, under the claim that the public had a right to know; what is clear to me is that all organizations, both public and private, need to make certain their information security programs are up to today’s challenges and threats.
Relevant in this blog space is the impact on organizations and their resiliency, and how best to mitigate such impacts. The cyber activity following the release of confidential information led to DOS cyberattacks and the outages for major credit card networks, which had a subsequent disruptive impact on numerous businesses and their e-commerce. This risk is real, and calls for every organization to review the effectiveness of their information security programs in dealing with such incidents. GLB and HIPAA regulations call for the periodic assessment of electronic security against anticipated risks or hazards. Given the demonstrated impact to systems these past few months, this is now a risk that must be anticipated (GLB: 16 CFR 314; HIPAA: 45 CFR 160-164).
Different Risks, Different Training
Resilience and crisis management each depend on responses to risks, both actual and anticipated. Beyond the technical programs for information security and the capability to recover your operations at an alternate facility, resilience and crisis management call for effective emergency communications programs, something frequently overlooked. If your plans don’t include guidance on who should speak in the face of a disaster, what they will say and how you will preview any statements before release to the public, then it is time to update your plans. Consider drafting sample statements for the anticipated risks; the internal review of these sample statements not only better prepare your spokespeople, but also help uncover additional elements of your plan that may need to be updated.
Continued Monitoring and Exercising
Ongoing monitoring of risks and mitigation programs is important – and required by regulations. As any fitness trainer will advise you, you need to keep at your exercise program, or you will quickly fall behind.
Tags: BC/DR, business continuity, compliance, continuity, Crisis Communication, crisis management, information security, regulations, resilience, risk, risk management, SunGard Availability
Posted in Uncategorized
14 Comments
Posted: November 16, 2010, 4:16 pm
… and water, and HVAC, and all related infrastructure components. Resilience is dependent on all these infrastructure components, along with network communications.
This became obvious this weekend with two distinct events:
- A friend shared a photo of a car that went through the side of a building while parking; in doing so, they broke water and sewage lines which prevented the building from remaining open for business occupancy for several days.
- An underground explosion and fire in Philadelphia early Monday morning – just a few blocks from my office – caused local businesses to deal with power outages and street closures when they arrived back after the weekend.
These two incidents are a reminder that every business is dependent on utilities for power and water and on telecommunications carriers for their connections to the outside world. Not long ago, I directed a continuity drill for a brokerage company, where we simulated an underground fire. Within ten days of that simulation, underground utility fires or explosions occurred in both Philly and New York.
Scenario Planning
Continuity planning is never really finished. It is a cyclical process, including establishing policies for your organization, assessing capabilities to meet those policies, training staff and validating capabilities, and then maintaining that readiness. In parallel to maintaining the readiness and capability is the ongoing question of whether your organizational needs are changing.
Regular validation is recommended by most standards and may be mandatory under many regulations. This includes updating the scenarios which you follow when conducting any validation exercise.
Disaster Statistics Point to Risks
So what are the top causes of disasters? Of 2,367 disasters supported by SunGard over the years, 1,181 (49%) were caused by hardware failures (570), weather (349) or power failures (282). Fourth on the list is terrorism at 7.4%.
With these threats in mind, it becomes easier to ensure that any scenario planning you consider to maintain and validate your plan includes those elements that have been consistently a threat to continuity, in addition to any industry-specific threats you need to anticipate.
The SunGard statistics for 2009 (last full-year) show that the industries with the greatest number of declaration events are led by financial organizations, followed by manufacturing, government, technology, services, health care and insurance. When you consider the business drivers of regulations and supply chain dependency, these industry segments demonstrate the greatest maturity in their continuity planning programs.
When developing your business continuity program, be sure to consider a broad array of possible threats. The events that will actually occur will differ, but the guidance of your plan will still inform your decision process as you recover production capability.
________________________________________________________________
Please join me at the Continuity Insights webinar discussing alternate site selection on November 23, 2010.
Tags: BC/DR, business continuity, crisis management, disaster recovery, infrastructure, recovery planning, resilience, risk, risk management, SunGard Availability, utility outage
Posted in Uncategorized
Add a Comment
Posted: November 6, 2010, 12:46 am
Earlier this week, a colleague asked whether cybersecurity was really different from information security, and if so, how was it to be managed: within or separate from IT security?
Cybersecurity has a focus on external electronic threats to your information or operation. No IT security program would be effective without considering such external threats, so it is fair to say that cybersecurity is a specialty area within the broad requirements of IT security. Internal security looks at passwords, access authorization, employee awareness and training, data protection and more. What makes cybersecurity unique is the complexity of a changing environment, and the need to constantly upgrade the monitoring and tools in order to stay ahead of increasingly sophisticated attacks. An additional cybersecurity dimension is that most external aspects of the Internet are not government owned, but are provided
The State of Federal Cybersecurity
Last month, the GAO released their audit report on federal initiatives to improve federal cybersecurity. Since the 2009 GAO report to Congress, the audit showed progress, but called for further improvement. The previous report to Congress showed a rise in cybersecurity breach-related incidents from 5,503 in 2006 to 16,843 in 2008. This increase in a three year period shows that cyber threats are increasing broadly; evidence of increased threat activity is reported by every monitoring agency and info security company, and such threats are not limited to the federal sector.
In last week’s post, the suggestion was made that business decisions are at their core risk management decisions. A recent ISACA Journal article referenced the 2009 GAO report in a discussion of FISMA requirements, and raised an interesting challenge: cybersecurity programs need to shift from a compliance focus to a risk-based info security program.
Risk-based Information Security
The rational for this shift is straight-forward: implementing new compliance oriented security programs take time, but as soon as they are deployed, they may be obsolete given the principal threat – cyber terrorists – keeps changing. Those behind the threats and rising numbers of attacks are creative, smart and dangerous. As soon as they detect an obstacle to their threat path, they will look for a new weakness.
Risk-based cybersecurity programs have a greater flexibility to respond to changing threats. Compliance-based programs monitor specific, known threat attributes. The advantage of a risk-based approach is the ability to adapt and respond to a wide variety of threats, including those that are constantly changing.
While the ISACA report specifically addresses federal systems and FISMA requirements, it can readily be applied to private sector cybersecurity programs, leading to more robust and secure systems, with increased resilience as threats continue to change.
Tags: BC/DR, business continuity, continuity, disaster recovery, FISMA, information security, ISACA, resilience, risk, risk management, SunGard Availability
Posted in Uncategorized
3 Comments
Posted: October 29, 2010, 10:46 am
Just having returned from the Gartner IT Symposium, Jim Grogan, senior director at SunGardAS, offered to share a few key points from the conference…CM
IT Management Shifting Viewpoint: from “Output” to “Outcome”
Gartner’s Andrea DiMaio and Mark McDonald pointed to a shift from “output” to “outcome” in how we need to make IT decisions for the future. The real value rests in the outcome, and this is a better way to measure any IT investment. When struggling to demonstrate clear measures of project success, this offers a new opportunity to speak to business leaders and differentiate performance.
Datacenter Expansion: “Buy” or “Build”
Analyst Eric Knipp projected that by 2015, more than half of the organizations surveyed reported they will need to either structurally change or build a new datacenter. Given the lead time needed for such projects, this would suggest that activity will commence in 2011 for this prediction to materialize.
Choosing Technology
Research VP Hung LeHong offered a key approach for IT decision makers to view technology. Hung stated the first question to ask should be: “Is this technology valuable?”, not “How does it work?” This ties back to the perspective of outcome, not output. Whether the technology is valuable to you depends on the potential outcome in your organization.
The Changing Shape of Cloud Adoption
These points offer valuable insight on cloud-technology adoption. Costs are part of every business decision, but cost savings per se are not the first reason to look at cloud computing. As Daryl Plummer stated, cost savings are the side-benefit of cloud computing.
Considering the potential build-out of datacenters, cloud technology has a significant role, either to offset the potential space required within an organization’s own data center, or to ensure that internal cloud deployments consider resilience and the need to fail-over for critical application availability.
When you answer the question that a particular technology is valuable (as Hong LeHong asked), and forge ahead with application deployment, resilience must be part of the architecture to support the new application. For these “valuable applications”, the cloud may become the preferred approach for deployments. The side-benefit of lower costs to deploy new applications on the cloud will help fuel growth strategies .
Mark Benioff, CEO of Salesforce.com, offered this observation during his keynote address: “We need to innovate as an industry; consolidation is not a growth strategy!” For all mission critical applications, that growth will depend on resilience to be sustainable growth. Whether leveraging cloud technology, virtualization or other yet-to-be seen technology, our on-demand, real-time world demands that successful applications remain highly available.
Tags: Cloud Adoption, cloud technology, data center, gartner, loud, resilience
Posted in IT and the Cloud
Add a Comment