Use when implementing circuit breakers, retries, bulkheads, or other resilience patterns. Covers failure handling strategies for distributed systems.
View on GitHubmelodic-software/claude-code-plugins
systems-design
plugins/systems-design/skills/resilience-patterns/SKILL.md
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/melodic-software/claude-code-plugins/blob/main/plugins/systems-design/skills/resilience-patterns/SKILL.md -a claude-code --skill resilience-patternsInstallation paths:
.claude/skills/resilience-patterns/# Resilience Patterns
Patterns for building systems that gracefully handle failures, degrade gracefully, and recover automatically.
## When to Use This Skill
- Implementing circuit breakers
- Designing retry strategies
- Isolating failures with bulkheads
- Building fault-tolerant systems
- Handling cascading failures
## Why Resilience Matters
```text
In distributed systems, failure is not exceptional—it's normal.
Networks fail. Services crash. Databases timeout.
The question isn't IF but WHEN.
Resilience = The ability to handle failures gracefully
Goals:
- Prevent cascading failures
- Degrade gracefully
- Recover automatically
- Maintain availability
```
## Core Resilience Patterns
### 1. Retry Pattern
```text
What: Automatically retry failed operations
When: Transient failures (network blips, temporary unavailability)
Simple retry:
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Request │────►│ Failure │────►│ Retry │───► Success
└─────────┘ └─────────┘ └─────────┘
With backoff:
Request → Fail → Wait 100ms → Retry
Fail → Wait 200ms → Retry
Fail → Wait 400ms → Retry
Fail → Give up
Backoff strategies:
- Fixed: Wait same time each retry
- Linear: 100ms, 200ms, 300ms...
- Exponential: 100ms, 200ms, 400ms, 800ms...
- Exponential + Jitter: Add randomness to prevent thundering herd
```
#### Retry Best Practices
```text
Do:
- Add jitter to prevent thundering herd
- Set maximum retry count
- Use exponential backoff
- Only retry transient failures
- Log retries for visibility
Don't:
- Retry non-idempotent operations blindly
- Retry client errors (400s)
- Retry indefinitely
- Use same delay for all retries
```
### 2. Circuit Breaker Pattern
```text
What: Stop calling a failing service temporarily
When: Service is consistently failing
States:
┌──────────────────────────────────────────────────────────┐
│ │
│ ┌────────┐