Back to Skills

incident-response

verified

Use when designing incident management processes, creating runbooks, or establishing on-call practices. Covers incident lifecycle, communication, and postmortems.

View on GitHub

Marketplace

melodic-software

melodic-software/claude-code-plugins

Plugin

systems-design

Repository
Verified Org

melodic-software/claude-code-plugins
13stars

plugins/systems-design/skills/incident-response/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/melodic-software/claude-code-plugins/blob/main/plugins/systems-design/skills/incident-response/SKILL.md -a claude-code --skill incident-response

Installation paths:

Claude
.claude/skills/incident-response/
Powered by add-skill CLI

Instructions

# Incident Response

Patterns and practices for effective incident management, from detection through postmortem.

## When to Use This Skill

- Designing incident response processes
- Creating incident runbooks
- Establishing on-call rotations
- Running effective postmortems
- Improving mean time to recovery (MTTR)

## Incident Lifecycle

```text
┌─────────────────────────────────────────────────────────┐
│                  INCIDENT LIFECYCLE                      │
│                                                          │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐   │
│  │ Detect  │─►│ Respond │─►│ Recover │─►│ Learn   │   │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘   │
│       │            │            │            │          │
│       ▼            ▼            ▼            ▼          │
│   Alerting    Triage &     Mitigation  Postmortem     │
│   Monitoring  Diagnosis    Remediation  Action Items   │
└─────────────────────────────────────────────────────────┘
```

### Key Metrics

```text
MTTD - Mean Time to Detect
└── Time from incident start to detection

MTTA - Mean Time to Acknowledge
└── Time from alert to human acknowledgment

MTTR - Mean Time to Recover
└── Time from detection to resolution

MTTF - Mean Time to Failure
└── Time between incidents (reliability)

Goal: Minimize MTTD + MTTA + MTTR
```

## Incident Severity

### Severity Levels

```text
SEV 1 - Critical
├── Complete outage
├── Data loss or security breach
├── All/most users affected
├── Response: Immediate (24/7)
└── Example: Production database down

SEV 2 - High
├── Major functionality impaired
├── Significant user impact
├── Workaround may exist
├── Response: Urgent (business hours++)
└── Example: Payment processing degraded

SEV 3 - Medium
├── Partial functionality affected
├── Limited user impact
├── Workaround available
├── Response: Normal priority
└── Example: Report generation slow

SEV 4 - Low
├── Minor issue
├── Minimal user impact
├── Response: Best effort
└

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
9585 chars