The purpose of a Disaster Recovery Plan (DRP) aims to plan for the timely re-establishment of an IT infrastructure. It aims to enable the operational recovery of services in the event of a disaster.
A disaster recovery plan differs from a business continuity plan.
A disaster recovery plan must allow a switchover to an “alternative” IT infrastructure dedicated to the survival of the business or activity.
Disaster recovery plans are designed and updated according to business needs.
Any DRP must be based on the following two concepts:
RTO: Return Time on Objective – RTO
RPO: Recovery Point Objective – RPO
Any entity wishing to develop a disaster recovery plan will initially need to define security goals based on these basic needs (see Risk Management).
The RTO defines the maximum acceptable time during which an IT resource may be down due to a disaster.
This downtime takes account of:
The RPO defines the maximum amount of data that can be lost as a result of a computer disaster. This value is the difference between the last backup and the incident. It is expressed in most cases in minutes / hours.
The diagram below shows service level changes according to incident. It aims to model the concepts of RPO and RTO to show how they differ, but also how they are complementary.
Depending on the size of the disaster, a recovery plan must be able to take account of many recovery scenarios, ranging from simple actions to complex mechanisms.
In concrete terms, a company is exposed to numerous daily risks, which may lead to a disaster and justify the activation of a recovery plan.
The above examples are variable in terms of the RPO and RTO concepts and demonstrate that a backup plan must be based on different technologies to respond to a multitude of disasters.
Overall, the implementation of a recovery plan is based on 12 key points.
Any asset that is part of the infrastructure’s IT system must be clearly identified and listed in a Configuration Management DataBase (CMDB).
It is essential that the database is kept up to date. We recommend adding a field that allows you to enter the date on which the equipment was first used to identify wear and tear and obsolescence.
Each database and each application must be clearly identified in a database shared with the IT assets database to show the relationships between physical assets and logical assets.
At any given time, this database should allow you to answer the following question: “Which application is hosted on which server?” At this stage, it is just as important to map the IT infrastructure to model the links between server rooms and servers – storage – flows – applications – databases.
The classification of assets entails a process involving business line managers (trade, accounting, etc.), the IT manager and a member of the management board.
The aim is to determine which applications are necessary for the optimal running of the “company”. We recommend using the following value scale:
Once each application has been classified, the IT manager must attribute the same classification level to each IT resource. In the case of shared resources, the most critical classification will always take priority.
Thanks to this classification, the IT manager is able to prepare the disaster recovery plan in accordance with the business priorities.
Based on the inventories and asset classifications, the IT manager draws up a document establishing the rules of priority for restarting services.
This document must be approved by a member of the management board.
For its part, the board of directors must appoint an expert to handle any risks to which the “company” is exposed (risk analysis).
The success criteria for stage 4 is exhaustive knowledge of the company’s IT environment and the risks to which it is exposed.
Once the previous step has been validated, the IT manager, along with the business line managers and a member of the management board, all meet to determine the RPOs and RTOs.
This is an important stage, because they look at the IT system’s ability to resolve a disaster in accordance with the company’s needs and risks.
Based on the results of the negotiations in stage 5, the IT manager must consider and suggest technical solutions that meet the business’s requirements.
The IT manager’s work must focus on two issues:
A number of solutions currently exist to meet the business’s requirements, but the costs can vary significantly. The lower the RTO/RPO thresholds, the higher the costs.
Once the IT manager has finished looking into technical feasibility, they present a report to the members of the management board containing the choices that best meet the business’s requirements and the financial and technical limitations.
Insofar as investments must be made to develop or maintain the disaster recovery plan, this need must be mentioned in the report.
Once this stage has been approved, the IT manager implements the technical solutions specific to the disaster recovery plan.
Depending on the budget, resources and objectives, the IT manager initiates the implementation of the technical solutions, taking into account the deadlines set by the management board members. A schedule providing the deadlines must have received prior approval from the management board and IT manager.
Once all of the technical solutions are up and running, those in charge of their maintenance must write up technical implementation procedures. These must be tested by a third party.
For security reasons, these procedures must not be stored in the same physical environment. They must be safeguarded against any modifications or alterations and protected in such a way that they can only be accessed by those who need to read them.
The IT manager, who is the de facto manager of disaster recovery plan maintenance, must ensure that the technical procedures are protected, available and up to date.
At the end of the process, they start drafting the disaster recovery plan based on various scenarios and taking care to include, for each situation, the activation conditions and the related technical processes.
In concrete terms, a scenario stages a risk to which the “company” has been exposed and presents the solution to resolve the situation in accordance with the RPOs and RTOs.
A scenario must be put together “simply”. Here are two examples of a scenario:
Scenario 1: Exploitation of a WiFi weakness by a malicious entity =>Risk analysis has previously detected a vulnerability in the WiFi infrastructure that gives access to the internal network. The WPA key has not been changed for 2 years, the “password” is weak and it is known by all staff members, including “persons” who no longer work for the company.
General theme: A hacker is hired by competitors to corrupt customer data.
A hacker, motivated by financial gain, has the IT resources and expert knowledge required to exploit the technical vulnerabilities of the IT network. More specifically, they take advantage of the poorly protected WiFi connection and corrupt the server on which the customers’ financial data is hosted, affecting the integrity of the data. They voluntarily change the customers’ financial information (debits/credits/pending transactions).
Key ingredients in this scenario:
Elements to test:
Can we meet the RPO? Countermeasures / technical solutions to be tested:
Scenario 2: Major fire in the server room => The risk analysis has previously detected several vulnerabilities in the room containing the servers that host the company’s vital data. There is no smoke detector, no automatic fire extinguishing system and the backup system is hosted in the same room as the servers.
General theme: A short circuit causes the total destruction of the server room.
The company director has decided to start electrical renovation works. On Friday evening, a tradesman working on the electrical board in the server room forgets to connect the circuit breakers. In the early hours of Saturday morning, overheating caused by a faulty contact causes a fire that destroys all of the IT equipment. To make things worse, the backup systems are kept in the same server room. However, a copy of the backups is placed in a safe in an adjoining building on a monthly basis.
Key ingredients in this scenario:
Elements to test:
Using the backup stored in the safe in the adjoining building:
Countermeasures / technical solutions to be tested:
The disaster recovery plan must be tested regularly. Using the document describing the different scenarios, the IT manager creates an incident scenario to test the technological and organisational capabilities of business recovery.
A report must be written at the end of each exercise. This report is sent to the management board and details both the positive and negative points of the disaster recovery test. In conclusion, the IT manager suggests improvements or recommendations if the RPO and RTO cannot be met.
The IT manager is responsible for ensuring that the disaster recovery plan and procedures are up to date. In the event of a major change to the IT infrastructure, the twelve key points cycle needs to be restarted.