No step is more critical for success in the field of Information Technology than documentation. It’s squarely in the middle of the System Development Life Cycle, but usually left until after testing and deployment. However, when it comes to a successful Disaster Recovery (DR) strategy, documentation is even more important than it is for standard IT functions.
While most companies have at least some plan for data backup and recovery, a DR strategy requires more than just being able to restore lost data.
A DR plan doesn’t just entail having a backup plan for the data in your servers. It is the overarching document that guides your employees and partners in restoring functionality to your business. Because your technology and systems are constantly being updated or changed, this plan has to be a “living document.” Changes to systems, teams, and architecture must be accounted for in your DR plan as they happen. This information must also be available to everyone responsible for any part of the recovery process. Let’s dive into this a little further and we’ll help you develop a roadmap for documentation of your disaster recovery plan.
The Elements of a Good Plan
A good DR plan document will be detailed, kept up-to-date with current information, and accessible by anyone who needs to refer to it in the case of a disaster. The elements will vary according to your company’s structure, the type of business you have, and what services you have supported by partners like your MSP or Cloud provider.
The following is a short list of essential elements that you need to include at a minimum.
Communication and Roles
Who does what and how to get ahold of people are the two most essential needs in the immediate aftermath of the disastrous event. Contact information for all employees and providers essential to the DR needs to be kept up-to-date and readily accessible. Also, each team and team member’s roles in case of an emergency event must be clearly outlined.
Diagrams of the equipment, infrastructure, and data flow can be an essential part of any necessary restoration or rebuilding.
Systems and Asset Inventory
The Systems and Asset Inventory covers physical assets—like servers and laptops—as well as agreements with providers and agreements with vendors. If you are outsourcing your primary IT and data to an MSP, you will have a shorter list of actual assets, but will need to know exactly what your agreement provides.
Application Dependencies and Prioritization
Detailing which applications interact with others is essential to the plan. Your plan should list the application you need to restart first, identify the apps that are mission critical and those you can delay restarting, as well as the level of priority to recover each. Once determined, these should be outlined in both your internal and external Service Level Agreements (SLAs). You will also need to have a step-by-step roadmap for your administrators to follow, so that systems like point-of-sale payment or customer-facing applications are restored quickly, while those that can be delayed slightly are moved lower down on the list.
RTO and RPO
Recovery Time Objective (RTO) is the “deadline” in a disaster recovery situation. It’s determined by evaluating how quickly your system needs to be back online when something goes wrong. Your backup/replication strategy and schedule will determine how recent the data you are restoring will be. You want to make sure the latest backup is not older than the Recovery Point Objective (RPO) you have set. Think about the potential re-work required when you determine RPO – it will be different for every business and sometimes different for individual applications.
After DR events, most industries have regulatory obligations regarding reporting, documentation, and future protection against further instances. Whether HIPAA, Sarbanes-Oxley or PCI SSC—if your business is subject to regulations which require reporting after an outage or breach, this is a must include.
Different Degrees of Disaster
The worst-case scenario for a business disaster is obvious—some catastrophic event physically destroys your primary site. Whether a tornado, fire, or man-made disaster, most people can easily understand “what if everything at our primary location were suddenly wiped out?”
But there are varying degrees of failures that can occur with your business’s mission critical systems and data that might not be as immediately evident. Partial loss of or corruption of data, security breaches, temporary service outages, even loss of personnel who play key roles can constitute a disaster that can impact your day-to-day operations.
If you are using a Managed Services Provider for all or part of your solution, that can mitigate the impact of all of these. But an additional key component is proper documentation, especially in the last case.
Good Documentation Versus Tribal Knowledge
When key IT personnel are lost (whether voluntarily or accidentally) whatever information they haven’t documented often goes with them. Undocumented information about systems, procedures, or necessary business information is often referred to as “Tribal Knowledge.”
“A set of unwritten rules or information known by a group of individuals within an organization but not common to others that often contributes significantly to overall quality. Tribal knowledge may be essential to the production of a product or performance of a service but may also be counterintuitive to the process.”—BusinessDictionary.com
When the loss of key employees happens, having solid documentation about the systems and business processes that person was responsible for enables you to seamlessly onboard someone else in that position. In the case of Disaster Recovery, where time is a crucial factor, you do not have the luxury of painstakingly recreating any Tribal Knowledge that is required to get mission critical systems and data to a point of minimum viable recovery. It is imperative that clear, easy-to-access documentation of the steps, operations, and responsibilities of each person be updated regularly and maintained.
Not Once a Year, But All the Time
Again, any Disaster Recovery plan needs to be a “living document.” Your business needs, technology, and the infrastructure you rely on to succeed change at different paces. Effective documentation creates a roadmap for your employees to follow in the event something happens that affects the data or systems that are your business’s lifeblood. Every step in the process that is missing will result in lost time and require backtracking and recreation. Even if 100% of your IT is outsourced, someone needs to know who to call and whose job that is should the worst happen.
Additional Resources on This Topic: