Disaster Recovery Testing Guidelines


Application


The University Disaster Recovery (DR) standard and its supporting guidelines apply to all ITS managed computer systems.

Purpose


These guidelines are designed to support the operation of the DR standard. The DR standard is designed to ensure the DR requirements have been formally considered for all new and existing ITS managed computers systems.

Disaster recovery test frequency


Recovery Priority

Recovery Time

ing Frequency

1

1.0 hour

6 monthly

2

1.0 – 2.0 hours

6 monthly

3

2.0 – 4.0hours

6 monthly

4

4.0 hours

12 monthly

Good practice stipules the frequency of disaster recovery tests is determined by the system’s recovery time objective (RTO). The table above recommends DR testing frequency based on RTO.

Waiving disaster recovery tests


DR tests are performed to confirm a system or application can be resumed within the specified timeframe and also serves as training for staff.

The dates for DR Tests are recorded in the DR testing calendar. To allow flexibility, minimise Production system downtime and ensure efficient use of Staff time, DR tests may be waived, or combined with maintenance (e.g. software patching) if:

  • Maintenance on an active-active or active-stand-by system has been completed within the past 2 months
  • Maintenance on an active-active or active-stand-by system is scheduled to be completed in the following 2 months
  • If the system’s DR procedures have been activated during a live DR incident within the past 4 months

To determine whether a scheduled DR test can be waived the following points must be considered:

  • If maintenance was performed, were servers disabled (at separate times) from one site and then the other meaning the system was operating solely from one site, and then the other
  • For scheduled maintenance, the same as the previous bullet point must apply
  • DR procedures for that system/service have been activated within the past 4 months
  • The failover includes 75% of the intended scope of the DR test

If it is decided the scheduled DR test is not required, the DR system owner must agree and a DR test report (using the DR test report template) must be written retrospectively and include a note to specifying the DR test was waived and the reasons why.

Disaster recovery test set-up


The types of DR test performed include:

  • Technical/activation
  • Specific component
  • Full

The DR testing programme for each system will largely determine the type of DR test to be performed and the test scope.

Process:

  1. Set test date
  2. Develop and circulate DR test scope
  3. Assemble test team, confirm test scope and identify preparation tasks
  4. Review the DR test report from the previous DR test
  5. Progress the DR test through change management
  6. Complete preparation tasks
  7. Perform DR test
  8. Complete post DR test activities

Test team


Secure ITS and business personnel to assist in preparing for, and performing the DR test.

The test team is the group who will prepare for and perform the DR test. Typically the test team includes the following:

  • The DR Coordinator
  • Representatives of the system owners (Technical Services or Infrastructure Services)
  • Testers from business users

 

Agreement with Business owner


Contact the business owner to confirm planning for the DR test has started and discuss:

  • Suitable dates for the DR test (based on the DR testing and maintenance calendar, processing cycles and staff         availability/preference)
  • The type of DR test to be performed (if applicable) and the DR test scope
  • Assistance from the business – e.g. testers
  • Appointing a business representative to assist in the planning of the DR test

Test scope


Review the previous DR test report to confirm all action items have been completed and identify any specific tests to be included in this DR test. Some considerations may be:

  • To test the connectivity to other systems (DR or Production)
  • User testing
  • Recover to a set point or to the current point

Risks


DR tests come with some risks, common risks are:

  • There is no DR during the DR test
  • There is no DR until the DR environment has been restored. This is particularly relevant if the stand-by database is taken out of stand-by and has to be restored.
  • If connectivity tests are included, care must be given to ensure live/Production data does not get passed to the DR systems and test data does not get passed to Production systems

During the planning stage all risks associated with this DR test must be identified and included in the DR test scope. Procedures must also be developed to manage these risks.

DR test approach


Set the start time for the DR test. Where possible, schedule preparation tasks to be performed slightly ahead of time. The duration of the DR test will depend on the type and complexity of test being performed. Ensure sufficient time to restore the Production and DR environments after the DR test is complete.

DR test progress updates to owners


Offer to text business and system owners at strategic stages during the DR Test, but they usually they only want to know when the DR Test is completed, or if issues arise.

Change management


Once the DR test scope, date and time and personnel are confirmed progress the DR test though the change management process.

User testing


User testing verifies the functionality and connections of the DR system and also the integrity of the DR data. A simple user test may be for a single User to log on the DR system and navigate screens, checking data. This would confirm the security, screen functionality and data of the DR system. A more complex user test may involve:

  • Multiple users from multiple sites logging on
  • Comparison of data between the production and DR systems via screen shots
  • Entering data into the DR system
  • Running reports
  • Checking links/data flow to other systems

For more complex user tests, it is worthy to note care must be given to ensuring
The test scope will define how much user testing is required
Testers are arranged by the business owner. Allow sufficient time for the testing and any regression/tidy up activities
As part of the preparation for the DR test, testers need to develop a user test plan. User test plans from previous DR tests, or test plans from any system development projects may be used
Arrange a session to review the test routine to ensure all objectives are tested and data integrity is maintained

DR Test Preparation


Review the DR plan and technical/supporting documentation to ensure they are current and complete and the DR system to ensure it is aligned with production. If significant changes to the DR system are required, note this in the test scope and the DR Test report.

Communications


Once the DR test has been approved by the CAB, ensure the business owner has advised all Stakeholders and Users of the impending DR test, especially the time and date. Care must be given to ensuring live data is not entered into the DR system by mistake and test data does not find its way into Production environments.

Perform DR test


Perform the DR test as documented in the DR test scope, DR plan and technical documentation. All issues and anomalies must be documented for review and action.

Post-Mortem review


All participants meet to review the DR test. The first item discussed is to compare test results (what went right and what went wrong) to the DR test scope to determine and agree the success (or otherwise) or the DR test.

The next, the issues encountered are reviewed. Actions are agreed and assigned to staff for resolution. If possible a resolution date is also agreed to ensure timely completion of action items. If the DR test was not successful, or only partially successful it is likely a follow up (full or partial) DR test will be required once action items are completed.

DR test report


The DR test report is written and circulated to stakeholders. If successful, the DR test is signed off by the business owner. The DR test report must then be posted on the SharePoint site.

DR test setup checklist


Complete

Tasks

Comment

 

Agree test date

 

 

Confirm and document DR test scope

 

 

Change Management

 

 

Review and update DR plan and technical documentation

 

 

Scope user tests and arrange testers

 

 

Communications – advise stakeholders

 

 

Perform DR test

 

 

Hold test post-mortem review meeting

 

 

Confirm and assign action items

 

 

Write test report and distribute

 

 

Obtain DR test sign off from business owner

 

 

Co-ordinate completion of action items

 

 

Test tidy up and follow up tasks

 

Definitions


The following definitions apply to these guidelines:

Computer System describes a group of applications which combined, provides an IT service. E.g. a typical computer system may include a business application, database and web access application.

Disaster recovery (DR) test refers to

Failover refers to transferring processing from the Production (or primary) computer system to the disaster recovery computer system

Incident is a disruptive event that causes an unacceptable interruption to business operations and/or IT services

Recovery time objective (RTO) is the length of time following the decision to invoke the DR Plan that an IT service must be restored

University means the University of Auckland and includes all subsidiaries

Key relevant documents


Include the following:

Document management and control


Prepared by: DR Programme Manager
Owned by: CIO
Date approved: November 2013
Review date: November 2016