Back to Skills

resilience-patterns

verified

Use when implementing circuit breakers, retries, bulkheads, or other resilience patterns. Covers failure handling strategies for distributed systems.

View on GitHub

Marketplace

melodic-software

melodic-software/claude-code-plugins

Plugin

systems-design

Repository
Verified Org

melodic-software/claude-code-plugins
13stars

plugins/systems-design/skills/resilience-patterns/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/melodic-software/claude-code-plugins/blob/main/plugins/systems-design/skills/resilience-patterns/SKILL.md -a claude-code --skill resilience-patterns

Installation paths:

Claude
.claude/skills/resilience-patterns/
Powered by add-skill CLI

Instructions

# Resilience Patterns

Patterns for building systems that gracefully handle failures, degrade gracefully, and recover automatically.

## When to Use This Skill

- Implementing circuit breakers
- Designing retry strategies
- Isolating failures with bulkheads
- Building fault-tolerant systems
- Handling cascading failures

## Why Resilience Matters

```text
In distributed systems, failure is not exceptional—it's normal.

Networks fail. Services crash. Databases timeout.
The question isn't IF but WHEN.

Resilience = The ability to handle failures gracefully

Goals:
- Prevent cascading failures
- Degrade gracefully
- Recover automatically
- Maintain availability
```

## Core Resilience Patterns

### 1. Retry Pattern

```text
What: Automatically retry failed operations
When: Transient failures (network blips, temporary unavailability)

Simple retry:
┌─────────┐     ┌─────────┐     ┌─────────┐
│ Request │────►│ Failure │────►│  Retry  │───► Success
└─────────┘     └─────────┘     └─────────┘

With backoff:
Request → Fail → Wait 100ms → Retry
                 Fail → Wait 200ms → Retry
                        Fail → Wait 400ms → Retry
                               Fail → Give up

Backoff strategies:
- Fixed: Wait same time each retry
- Linear: 100ms, 200ms, 300ms...
- Exponential: 100ms, 200ms, 400ms, 800ms...
- Exponential + Jitter: Add randomness to prevent thundering herd
```

#### Retry Best Practices

```text
Do:
- Add jitter to prevent thundering herd
- Set maximum retry count
- Use exponential backoff
- Only retry transient failures
- Log retries for visibility

Don't:
- Retry non-idempotent operations blindly
- Retry client errors (400s)
- Retry indefinitely
- Use same delay for all retries
```

### 2. Circuit Breaker Pattern

```text
What: Stop calling a failing service temporarily
When: Service is consistently failing

States:
┌──────────────────────────────────────────────────────────┐
│                                                          │
│   ┌────────┐  

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
10807 chars