Production-grade fault tolerance for distributed systems. Use when implementing circuit breakers, retry with exponential backoff, bulkhead isolation patterns, or building resilience into LLM API integrations.
View on GitHubFebruary 4, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/plugins/ork/skills/resilience-patterns/SKILL.md -a claude-code --skill resilience-patternsInstallation paths:
.claude/skills/resilience-patterns/# Resilience Patterns Skill Production-grade resilience patterns for distributed systems and LLM-based workflows. Covers circuit breakers, bulkheads, retry strategies, and LLM-specific resilience techniques. ## Overview - Building fault-tolerant multi-agent systems - Implementing LLM API integrations with proper error handling - Designing distributed workflows that need graceful degradation - Adding observability to failure scenarios - Protecting systems from cascade failures ## Core Patterns ### 1. Circuit Breaker Pattern (reference: circuit-breaker.md) Prevents cascade failures by "tripping" when a service exceeds failure thresholds. ``` +-------------------------------------------------------------------+ | Circuit Breaker States | +-------------------------------------------------------------------+ | | | +----------+ failures >= threshold +----------+ | | | CLOSED | ----------------------------> | OPEN | | | | (normal) | | (reject) | | | +----+-----+ +----+-----+ | | | | | | | success timeout | | | | expires | | | | +------------+ | | | | | HALF_OPEN |<-----------------+ | | +---------+ (probe) | | | +------------+ | | | | CLOSED: Allow requests, count failures | | OPEN: Reject immediately, return fallback | | HALF_OPEN: Allow probe request to test recovery | |