Use when creating incident response procedures and on-call playbooks. Covers incident management, communication protocols, and post-mortem documentation.
View on GitHubTheBushidoCollective/han
jutsu-runbooks
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/TheBushidoCollective/han/blob/main/jutsu/jutsu-runbooks/skills/incident-response/SKILL.md -a claude-code --skill runbooks-incident-responseInstallation paths:
.claude/skills/runbooks-incident-response/# Runbooks - Incident Response Creating effective incident response procedures for handling production incidents and on-call scenarios. ## Incident Response Framework ### Incident Severity Levels **SEV-1 (Critical)** - Complete service outage - Data loss or security breach - Major customer impact (>50% of users) - **Response Time:** Immediate - **Escalation:** Page on-call + manager **SEV-2 (High)** - Partial service degradation - Affecting significant users (10-50%) - Performance issues (>50% slower) - **Response Time:** Within 15 minutes - **Escalation:** Page on-call **SEV-3 (Medium)** - Minor degradation - Affecting few users (<10%) - Non-critical features broken - **Response Time:** Within 1 hour - **Escalation:** On-call handles during business hours **SEV-4 (Low)** - Cosmetic issues - Internal tools affected - No customer impact - **Response Time:** Next business day - **Escalation:** Create ticket, no page ## Incident Response Template ```markdown # Incident Response: [Alert/Issue Name] **Severity:** SEV-1/SEV-2/SEV-3/SEV-4 **Response Time:** Immediate / 15 min / 1 hour / Next day **Owner:** On-call Engineer ## Incident Detection **This runbook is triggered by:** - PagerDuty alert: `api_error_rate_high` - Customer report in #support - Monitoring dashboard showing anomaly ## Initial Response (First 5 Minutes) ### 1. Acknowledge & Assess ```bash # Check current status curl https://api.example.com/health kubectl get pods -n production ``` **Determine severity:** - All requests failing → SEV-1 - Partial failures → SEV-2 - Performance degraded → SEV-3 ### 2. Notify Stakeholders **SEV-1:** - Create Slack incident channel: `/incident create SEV-1 API Outage` - Page engineering manager - Notify customer success team **SEV-2:** - Post in #incidents channel - Tag on-call team **SEV-3:** - Post in #engineering channel - No pages needed ### 3. Start Incident Timeline Create incident doc (copy template): ``` Incident: API Outage Started: 20
Issues Found: