This document outlines the disaster recovery setup for the PostgreSQL cluster deployed as a managed database on DigitalOcean. The configuration ensures high availability, data integrity, and quick recovery in the event of an incident.
Common Failure Scenarios and Recovery Steps
1: Primary Node Failure
Impact
-
The primary node becomes unavailable, affecting write operations.
Recovery Steps
-
Promote an existing read-only node to become the primary node of the database cluster.
doctl databases replica promote <database-cluster-id> <replica-name> [flags]
-
Update DNS record in
prod.rockengroup.comdomain with a new connection strings if the primary node changes due to failover. -
Provision a new read-only node to maintain redundancy.
2: Cluster Failure
Impact
-
Both the primary and read-only nodes are unavailable, resulting in a complete loss of database access.
Recovery Steps
-
Initiate the restoration process through DigitalOcean interface. Use the PostgreSQL: Digital Ocean restore DB document for restoring
-
Ensure the restored databases are consistent and free of corruption.
-
Update connection string in
prod.rockengroup.comdomain to point to the newly restored cluster. -
Set up a new read-only node for redundancy.
3. Data Failure
Impact
-
Data corruption or accidental deletion affects database integrity.
Recovery Steps
-
Use Snapshooter (PostgreSQL: Snapshooter restore DB) to restore the database to a point-in-time before the data failure occurred:
-
go to
backup-snapshooter-prodSpace and find necessary DB archive -
download
MANUAL_RESTORE.txtfile and follow its recommendations to restore the DB .
-
-
Check the restored data for consistency and completeness.

Leave a Reply
You must be logged in to post a comment.