PostgreSQL: Disaster Recovery Plan

This document outlines the disaster recovery setup for the PostgreSQL cluster deployed as a managed database on DigitalOcean. The configuration ensures high availability, data integrity, and quick recovery in the event of an incident.

Common Failure Scenarios and Recovery Steps

1: Primary Node Failure

Impact

  • The primary node becomes unavailable, affecting write operations.

Recovery Steps

  1. Promote an existing read-only node to become the primary node of the database cluster.

    doctl databases replica promote <database-cluster-id> <replica-name> [flags]
  2. Update DNS record in prod.rockengroup.comdomain with a new connection strings if the primary node changes due to failover.

  3. Provision a new read-only node to maintain redundancy.

2: Cluster Failure

Impact

  • Both the primary and read-only nodes are unavailable, resulting in a complete loss of database access.

Recovery Steps

  1. Initiate the restoration process through DigitalOcean interface. Use the PostgreSQL: Digital Ocean restore DB document for restoring

  2. Ensure the restored databases are consistent and free of corruption.

  3. Update connection string in prod.rockengroup.com domain to point to the newly restored cluster.

  4. Set up a new read-only node for redundancy.

3. Data Failure

Impact

  • Data corruption or accidental deletion affects database integrity.

Recovery Steps

  1. Use Snapshooter (PostgreSQL: Snapshooter restore DB) to restore the database to a point-in-time before the data failure occurred:

    • go to backup-snapshooter-prod Space and find necessary DB archive

      image-20241220-094624.png

    • download MANUAL_RESTORE.txt file and follow its recommendations to restore the DB .

  2. Check the restored data for consistency and completeness.

Comments

Leave a Reply