When designing distributed systems for scalability, reliability, and consistency. Covers CAP/PACELC theorems, consistency models (strong, eventual, causal), replication patterns (leader-follower, multi-leader, leaderless), partitioning strategies (hash, range, geographic), transaction patterns (saga, event sourcing, CQRS), resilience patterns (circuit breaker, bulkhead), service discovery, and caching strategies for building fault-tolerant distributed architectures.
View on GitHubancoleman/ai-design-components
backend-ai-skills
February 1, 2026
Select agents to install to:
npx add-skill https://github.com/ancoleman/ai-design-components/blob/main/skills/designing-distributed-systems/SKILL.md -a claude-code --skill designing-distributed-systemsInstallation paths:
.claude/skills/designing-distributed-systems/# Designing Distributed Systems Design scalable, reliable, and fault-tolerant distributed systems using proven patterns and consistency models. ## Purpose Distributed systems are the foundation of modern cloud-native applications. Understanding fundamental trade-offs (CAP theorem, PACELC), consistency models, replication patterns, and resilience strategies is essential for building systems that scale globally while maintaining correctness and availability. ## When to Use This Skill Apply when: - Designing microservices architectures with multiple services - Building systems that must scale across multiple datacenters or regions - Choosing between consistency vs availability during network partitions - Selecting replication strategies (single-leader, multi-leader, leaderless) - Implementing distributed transactions (saga pattern, event sourcing, CQRS) - Designing partition-tolerant systems with proper consistency guarantees - Building resilient services with circuit breakers, bulkheads, retries - Implementing service discovery and inter-service communication ## Core Concepts ### CAP Theorem Fundamentals **CAP Theorem:** In a distributed system experiencing a network partition, choose between Consistency (C) or Availability (A). Partition tolerance (P) is mandatory. ``` Network partitions WILL occur → Always design for P During partition: ├─ CP (Consistency + Partition Tolerance) │ Use when: Financial transactions, inventory, seat booking │ Trade-off: System unavailable during partition │ Examples: HBase, MongoDB (default), etcd │ └─ AP (Availability + Partition Tolerance) Use when: Social media, caching, analytics, shopping carts Trade-off: Stale reads possible, conflicts need resolution Examples: Cassandra, DynamoDB, Riak ``` **PACELC:** Extends CAP to consider normal operations (no partition). - **If Partition:** Choose Availability (A) or Consistency (C) - **Else (normal):** Choose Latency (L) or Consistency (C) ### Consistency Models Spect