runbooks-incident-response

# Runbooks - Incident Response

Creating effective incident response procedures for handling production incidents and on-call scenarios.

## Incident Response Framework

### Incident Severity Levels

**SEV-1 (Critical)**

- Complete service outage
- Data loss or security breach
- Major customer impact (>50% of users)
- **Response Time:** Immediate
- **Escalation:** Page on-call + manager

**SEV-2 (High)**

- Partial service degradation
- Affecting significant users (10-50%)
- Performance issues (>50% slower)
- **Response Time:** Within 15 minutes
- **Escalation:** Page on-call

**SEV-3 (Medium)**

- Minor degradation
- Affecting few users (<10%)
- Non-critical features broken
- **Response Time:** Within 1 hour
- **Escalation:** On-call handles during business hours

**SEV-4 (Low)**

- Cosmetic issues
- Internal tools affected
- No customer impact
- **Response Time:** Next business day
- **Escalation:** Create ticket, no page

## Incident Response Template

```markdown
# Incident Response: [Alert/Issue Name]

**Severity:** SEV-1/SEV-2/SEV-3/SEV-4
**Response Time:** Immediate / 15 min / 1 hour / Next day
**Owner:** On-call Engineer

## Incident Detection

**This runbook is triggered by:**
- PagerDuty alert: `api_error_rate_high`
- Customer report in #support
- Monitoring dashboard showing anomaly

## Initial Response (First 5 Minutes)

### 1. Acknowledge & Assess

```bash
# Check current status
curl https://api.example.com/health
kubectl get pods -n production
```

**Determine severity:**

- All requests failing → SEV-1
- Partial failures → SEV-2
- Performance degraded → SEV-3

### 2. Notify Stakeholders

**SEV-1:**

- Create Slack incident channel: `/incident create SEV-1 API Outage`
- Page engineering manager
- Notify customer success team

**SEV-2:**

- Post in #incidents channel
- Tag on-call team

**SEV-3:**

- Post in #engineering channel
- No pages needed

### 3. Start Incident Timeline

Create incident doc (copy template):

```
Incident: API Outage
Started: 20
runbooks-incident-response

Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details