Production-grade fault tolerance for distributed systems. Use when implementing circuit breakers, retry with exponential backoff, bulkhead isolation patterns, or building resilience into LLM API integrations.
View on GitHubyonatangross/skillforge-claude-plugin
ork
January 25, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/skills/resilience-patterns/SKILL.md -a claude-code --skill resilience-patternsInstallation paths:
.claude/skills/resilience-patterns/# Resilience Patterns Skill Production-grade resilience patterns for distributed systems and LLM-based workflows. Covers circuit breakers, bulkheads, retry strategies, and LLM-specific resilience techniques. ## Overview - Building fault-tolerant multi-agent systems - Implementing LLM API integrations with proper error handling - Designing distributed workflows that need graceful degradation - Adding observability to failure scenarios - Protecting systems from cascade failures ## Core Patterns ### 1. Circuit Breaker Pattern (reference: circuit-breaker.md) Prevents cascade failures by "tripping" when a service exceeds failure thresholds. ``` +-------------------------------------------------------------------+ | Circuit Breaker States | +-------------------------------------------------------------------+ | | | +----------+ failures >= threshold +----------+ | | | CLOSED | ----------------------------> | OPEN | | | | (normal) | | (reject) | | | +----+-----+ +----+-----+ | | | | | | | success timeout | | | | expires | | | | +------------+ | | | | | HALF_OPEN |<-----------------+ | | +---------+ (probe) | | | +------------+ | | | | CLOSED: Allow requests, count failures | | OPEN: Reject immediately, return fallback | | HALF_OPEN: Allow probe request to test recovery | |