Disaster Recovery Plan v20230706

Disaster Recovery and Business Continuity Plan

Critical Systems and Services Evaluation

Our primary priority is to maintain the continuity and efficiency of our services. As a part of our disaster recovery (DR) plan, we have identified all systems, services, and data crucial to our operations. We have taken careful measures to ensure that these critical components are safeguarded and can be promptly restored in case of any catastrophic event.

Our Business continuation begins with an in-depth understanding of our most essential functions and processes. These have been meticulously identified to ensure that, in the event of a disruption, these critical operations can continue with minimal interruption. (Everything is open sourced in our public repos https://github.com/goose-wrappers/ )

2. Disaster Eventuality Planning

Our DR plan is designed to safeguard us against a wide spectrum of disaster scenarios:

These are not limited to natural disasters,
human-caused mishaps, and
technical system failures.

By conducting comprehensive Business Impact Analysis (BIA), we have outlined how each disaster could potentially influence our operations, which has led us to define our recovery time objectives (RTO) and recovery point objectives (RPO).

3. Incident Response Planning

Our business continuity strategy incorporates a detailed incident response plan.
This protocol outlines the immediate steps to be taken:

post-incident, to protect our staff, safeguard our technology infrastructure, and initiate the process of service restoration.

4. Continuity Procedures and Protocols

At the heart of our strategy are our detailed business continuity procedures which includes data redundancy measures, system failover mechanisms, and manual workarounds where necessary.

This is achieved through all the services and mechanisms AWS and Github already provide us, since we rely on these services.

Our protocol is designed to ensure that critical operations can continue throughout a disruption, normal service is through these provides and resumed as quickly as possible.

5. Data Backup and Redundancy

Our data backup strategy is multi-pronged to ensure maximum protection and immediate accessibility. It includes AWS Backup for the automation and centralization of backups, Amazon S3 for reliable object storage, AWS Snapshots for point-in-time recovery of data, and Cross-region Replication for safeguarding against region-wide outages.

AWS Backup - We use AWS Backup service to automate and centralize backups across AWS services. We have manage backup policies, monitor backup activity, and recovery all in one place. More details can be found here.
Amazon S3 - We use Amazon S3 for object storage for the application, which designed to provide 99.999999999% (11 9's) of durability, and stores data for millions of applications used by market leaders in every industry. More details here.
AWS Snapshot - We create snapshots for our EBS volumes, RDS databases, and other AWS resources. It provides point-in-time recovery for the application resources. More about Snapshots.

6. Disaster Recovery Strategies and objectives

Our DR approach has been designed for swift recovery, employing several robust mechanisms. This includes the Pilot Light strategy for replicating core system components, the Warm Standby solution for services that are always running, and the Multi-site approach for an active-active configuration with your existing infrastructure.

Pilot Light - This approach involves the replication of data and system components of our core systems in AWS, but the full-scale environment doesn't get provisioned until disaster recovery is initiated.
Warm Standby - A warm standby solution helps us to extend the pilot light elements and further decreases the recovery time because some services are always running.

To minimize the impact of an incident, we have clearly defined our recovery time objectives (RTO) and recovery point objectives (RPO). These goals drive our recovery efforts, ensuring minimal downtime and data loss.

7. System and Data Restoration

A vital part of our DR plan is the documented protocol for system restoration and data recovery.

This ensures a methodical and rapid recovery of AWS resources, data recovery from backups, and a seamless transition to the recovery site when required.

8. Github Actions Protection

Along with our primary data, we take special care of the GitHub Actions workflows that are integral to our operations. Any vital workflows essential for deployment or maintenance are version-controlled and securely preserved. This can be reviewed in our public facing org and subrepositories: https://github.com/organizations/goose-wrappers/

9. Periodic Plan Testing

We take the reliability of our DR plan very seriously, which is why we conduct regular tests to identify potential gaps and validate its effectiveness. This includes stringent checks for successful backups and systems recovery within the defined RTO.

9. Team Training

Our team is thoroughly familiar with the DR plan and is trained in its implementation. This training is continually updated to keep pace with the evolving technology landscape and to ensure our team is ready to respond quickly and effectively in any scenario.

10. Plan Updates

As a dynamic entity in the digital space, our DR plan is also continually evolving. As we grow and as new threats emerge, our DR and business continuity plans are updated to ensure maximum protection and resilience.

11. Communication Strategy

In the event of a disaster, our commitment to transparency and clear communication remains paramount.

Our first priority is to let Atlassian teams know through the proper channels via https://ecosystem.atlassian.net/servicedesk/customer/portal/34 and email

We have a well-defined communication plan to keep our employees, partners, and clients informed about the situation and our recovery steps.

This is an outline of our commitment to maintaining the continuity and reliability of our services. We understand the critical role we play in your digital journey, and this comprehensive DR plan is an affirmation of our promise to keep your systems running smoothly and efficiently, no matter what.

We are in the process setting up status page through Atlassian https://www.atlassian.com/software/statuspage/features to create more visibility to our customers.