Developing a Mishap Plan for Your Team Can Save You More than Money

Learn why and how to develop emergency recovery plans for your systems, processes and workflows.

Tristan Denyer
5 min readOct 18, 2023
Photo by Piotr Chrobot on Unsplash

My first exposure to a mishap plan was when I was in the U.S. Navy. They had them posted at key points of entry, or near complex systems for people to follow in the event of an emergency. Outside the military, they are often referred to as a contingency plan or emergency response plan, and can be a valuable piece of documentation.

And these are not just for software, engineering, or the military. Every company and team can benefit greatly from creating one or more of these for their processes, workflows or systems.

What Is the Value of Having One?

I feel it is two-fold. The often unseen reason to have one is to show that the team was proactive in thinking of the way or ways this process or system could fail. Knowing how to run a system is one thing, but knowing how it can possibly fail and having a procedure to get it back online (or perform a workaround, or simply recover from it) is the whole picture.

The other value is cost savings. Getting a system back online in a timely manner could mean preserving sales, and protecting the brand reputation. Enabling other team members to be self-reliant can free up resources and speed up recovery. Or, simply avoiding making a situation worse with people guessing and making a bigger mess of it.

What Are These Plans For?

Anything. But here are some broad use cases I’ve seen them used for:

  • Conference room audio / video equipment not working
  • WordPress website going down
  • Salesforce integration with Marketing
  • Panel interviews, and also for when a candidate no-shows
  • When a large customer churns
  • A/C stops working (I don’t miss that office 😅)

How to Write a Plan for When Things Go Sideways

Keep in mind that you likely don’t need 400 plans, nor one for everything down to the coffee machine. (Or maybe you do in your situation? ☕️) Start with the ones that seems to happen the most, and or the one that has the biggest potential negative impact.

  1. Identify potential mishaps: Begin by conducting a thorough assessment to identify potential outages or emergencies that could affect different aspects of your work. These may include natural disasters (e.g., earthquakes, floods), technological failures (e.g., IT system crashes), supply chain disruptions, security breaches, or public health crises (e.g., pandemics).
  2. Form a planning team: (Typically for larger responses) Create a dedicated team responsible for developing and maintaining the emergency plan. This team should include representatives from various departments and expertise areas to ensure comprehensive coverage.
  3. Define objectives and priorities: Clearly articulate the objectives of the plan. What are you trying to achieve with this plan? Prioritize the most critical aspects of your work that need immediate attention during a mishap.
  4. Risk assessment: Conduct a risk assessment for each potential outage or emergency. Evaluate the likelihood of occurrence, potential impact, and consequences on your organization. This will help you prioritize and allocate resources appropriately.
  5. Develop response procedures: Create step-by-step response procedures for each identified recovery proceedure. Outline the actions to be taken before, during, and after the mishap. Include details on roles and responsibilities, communication protocols, and resource allocation. Steps should be clear and concise that most anyone can follow.
  6. Communication plan: Develop a comprehensive communication plan that addresses internal and external communication during an outage. Define key contact points, methods of communication, and the chain of command.
  7. Resource allocation: Determine the resources needed to implement the recovery plan effectively. This includes personnel, equipment, financial resources, and any external support required.
  8. Training and drills: (Typically for larger responses) Train your employees on the recovery plan and conduct regular drills or simulations to ensure everyone knows their roles and responsibilities. This practice helps in smooth implementation during an actual mishap.
  9. Documentation and record keeping: Maintain detailed documentation of the recovery plan, including procedures, contact information, and resource allocation.
  10. Continuous Improvement: Periodically review and update the recovery plan to incorporate lessons learned from drills or actual incidents. Ensure that it remains relevant and effective.
  11. Public relations and reputation management: Include a section in your emergency recovery plan for managing public relations and safeguarding your organization’s reputation during and after a mishap.
  12. Backup and redundancy: Implement backup systems, redundancy measures, and data backups where applicable to minimize disruptions and data loss.
  13. External partnerships: Establish relationships with external organizations, emergency services, and suppliers who can provide support during an emergency. Include their contact information in your plan.
  14. Reporting and evaluation: Define a process for reporting mishaps, incidents, or near-misses and for evaluating the effectiveness of your recovery plan.
  15. Crisis communication: Develop crisis communication templates and strategies for communicating with employees, customers, suppliers, and the public during an emergency.
  16. Distribution and accessibility: Ensure that all relevant employees have access to the recovery plan. Consider digital and physical copies, as well as remote access.

When You Should Write the Plan

Now. But really, you should start writing it before the system or process is in place. When my team is working on a proposal for a new system, part of that proposal is a risk assessment, and in that assessment is much of the above. How can this go sideways? Who will get it back online? How would we recover? What is the failover plan should it not come back online? And more. That goes into our decision making, especially for critical systems. And that is used for the development of the recovery plan.

When the documentation (called runbooks and playbooks in my world) comes out for that new system, so does the emergency recovery plan. Then we do a training session on the system and how to recover it.

And rewrite the plan…

Remember that emergency recovery plans should be adaptable and responsive to changing circumstances. Regularly revisit and refine your plans to stay prepared and resilient in the face of unforeseen challenges. A well-crafted plan can save you time, resources, money and brand reputation.

--

--

Tristan Denyer

I am that unique blend of engineer and designer, leader and manager, team builder and bridge builder.