Back to Skills

planning-disaster-recovery

verified

Design and implement disaster recovery strategies with RTO/RPO planning, database backups, Kubernetes DR, cross-region replication, and chaos engineering testing. Use when implementing backup systems, configuring point-in-time recovery, setting up multi-region failover, or validating DR procedures.

View on GitHub

Marketplace

ai-design-components

ancoleman/ai-design-components

Plugin

backend-ai-skills

Repository

ancoleman/ai-design-components
153stars

skills/planning-disaster-recovery/SKILL.md

Last Verified

February 1, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/ancoleman/ai-design-components/blob/main/skills/planning-disaster-recovery/SKILL.md -a claude-code --skill planning-disaster-recovery

Installation paths:

Claude
.claude/skills/planning-disaster-recovery/
Powered by add-skill CLI

Instructions

# Disaster Recovery

## Purpose

Provide comprehensive guidance for designing disaster recovery (DR) strategies, implementing backup systems, and validating recovery procedures across databases, Kubernetes clusters, and cloud infrastructure. Enable teams to define RTO/RPO objectives, select appropriate backup tools, configure automated failover, and test DR capabilities through chaos engineering.

## When to Use This Skill

Invoke this skill when:
- Defining recovery time objectives (RTO) and recovery point objectives (RPO)
- Implementing database backups with point-in-time recovery (PITR)
- Setting up Kubernetes cluster backup and restore workflows
- Configuring cross-region replication for high availability
- Testing disaster recovery procedures through chaos experiments
- Meeting compliance requirements (GDPR, SOC 2, HIPAA)
- Automating backup monitoring and alerting
- Designing multi-cloud disaster recovery architectures

## Core Concepts

### RTO and RPO Fundamentals

**Recovery Time Objective (RTO):** Maximum acceptable downtime after a disaster before business impact becomes unacceptable.

**Recovery Point Objective (RPO):** Maximum acceptable data loss measured in time. Defines how far back in time recovery must reach.

**Criticality Tiers:**
- **Tier 0 (Mission-Critical):** RTO < 1 hour, RPO < 5 minutes
- **Tier 1 (Production):** RTO 1-4 hours, RPO 15-60 minutes
- **Tier 2 (Important):** RTO 4-24 hours, RPO 1-6 hours
- **Tier 3 (Standard):** RTO > 24 hours, RPO > 6 hours

### 3-2-1 Backup Rule

Maintain **3 copies** of data on **2 different media** types with **1 copy offsite**.

Example implementation:
- Primary: Production database
- Secondary: Local backup storage
- Tertiary: Cloud backup (S3/GCS/Azure)

### Backup Types

**Full Backup:** Complete copy of all data. Slowest to create, fastest to restore.

**Incremental Backup:** Only changes since last backup. Fastest to create, requires full + all incrementals to restore.

**Differential Backup:** Change

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
11756 chars