Back to Skills

designing-distributed-systems

verified

When designing distributed systems for scalability, reliability, and consistency. Covers CAP/PACELC theorems, consistency models (strong, eventual, causal), replication patterns (leader-follower, multi-leader, leaderless), partitioning strategies (hash, range, geographic), transaction patterns (saga, event sourcing, CQRS), resilience patterns (circuit breaker, bulkhead), service discovery, and caching strategies for building fault-tolerant distributed architectures.

View on GitHub

Marketplace

ai-design-components

ancoleman/ai-design-components

Plugin

backend-ai-skills

Repository

ancoleman/ai-design-components
153stars

skills/designing-distributed-systems/SKILL.md

Last Verified

February 1, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/ancoleman/ai-design-components/blob/main/skills/designing-distributed-systems/SKILL.md -a claude-code --skill designing-distributed-systems

Installation paths:

Claude
.claude/skills/designing-distributed-systems/
Powered by add-skill CLI

Instructions

# Designing Distributed Systems

Design scalable, reliable, and fault-tolerant distributed systems using proven patterns and consistency models.

## Purpose

Distributed systems are the foundation of modern cloud-native applications. Understanding fundamental trade-offs (CAP theorem, PACELC), consistency models, replication patterns, and resilience strategies is essential for building systems that scale globally while maintaining correctness and availability.

## When to Use This Skill

Apply when:
- Designing microservices architectures with multiple services
- Building systems that must scale across multiple datacenters or regions
- Choosing between consistency vs availability during network partitions
- Selecting replication strategies (single-leader, multi-leader, leaderless)
- Implementing distributed transactions (saga pattern, event sourcing, CQRS)
- Designing partition-tolerant systems with proper consistency guarantees
- Building resilient services with circuit breakers, bulkheads, retries
- Implementing service discovery and inter-service communication

## Core Concepts

### CAP Theorem Fundamentals

**CAP Theorem:** In a distributed system experiencing a network partition, choose between Consistency (C) or Availability (A). Partition tolerance (P) is mandatory.

```
Network partitions WILL occur → Always design for P

During partition:
├─ CP (Consistency + Partition Tolerance)
│  Use when: Financial transactions, inventory, seat booking
│  Trade-off: System unavailable during partition
│  Examples: HBase, MongoDB (default), etcd
│
└─ AP (Availability + Partition Tolerance)
   Use when: Social media, caching, analytics, shopping carts
   Trade-off: Stale reads possible, conflicts need resolution
   Examples: Cassandra, DynamoDB, Riak
```

**PACELC:** Extends CAP to consider normal operations (no partition).
- **If Partition:** Choose Availability (A) or Consistency (C)
- **Else (normal):** Choose Latency (L) or Consistency (C)

### Consistency Models Spect

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
16031 chars