Strategic guidance for designing modern data platforms, covering storage paradigms (data lake, warehouse, lakehouse), modeling approaches (dimensional, normalized, data vault, wide tables), data mesh principles, and medallion architecture patterns. Use when architecting data platforms, choosing between centralized vs decentralized patterns, selecting table formats (Iceberg, Delta Lake), or designing data governance frameworks.
View on GitHubancoleman/ai-design-components
backend-ai-skills
February 1, 2026
Select agents to install to:
npx add-skill https://github.com/ancoleman/ai-design-components/blob/main/skills/architecting-data/SKILL.md -a claude-code --skill architecting-dataInstallation paths:
.claude/skills/architecting-data/# Data Architecture ## Purpose Guide architects and platform engineers through strategic data architecture decisions for modern cloud-native data platforms. ## When to Use This Skill Invoke this skill when: - Designing a new data platform or modernizing legacy systems - Choosing between data lake, data warehouse, or data lakehouse - Deciding on data modeling approaches (dimensional, normalized, data vault, wide tables) - Evaluating centralized vs data mesh architecture - Selecting open table formats (Apache Iceberg, Delta Lake, Apache Hudi) - Designing medallion architecture (bronze, silver, gold layers) - Implementing data governance and cataloging ## Core Concepts ### 1. Storage Paradigms Three primary patterns for analytical data storage: **Data Lake:** Centralized repository for raw data at scale - Schema-on-read, cost-optimized ($0.02-0.03/GB/month) - Use when: Diverse data sources, exploratory analytics, ML/AI training data **Data Warehouse:** Structured repository optimized for BI - Schema-on-write, ACID transactions, fast queries - Use when: Known BI requirements, strong governance needed **Data Lakehouse:** Hybrid combining lake flexibility with warehouse reliability - Open table formats (Iceberg, Delta Lake), ACID on object storage - Use when: Mixed BI + ML workloads, cost optimization (60-80% cheaper than warehouse) **Decision Framework:** - BI/Reporting only + Known queries → Data Warehouse - ML/AI primary + Raw data needed → Data Lake or Lakehouse - Mixed BI + ML + Cost optimization → Data Lakehouse (recommended) - Exploratory/Unknown use cases → Data Lake For detailed comparison, see [references/storage-paradigms.md](references/storage-paradigms.md). ### 2. Data Modeling Approaches Four primary modeling patterns: **Dimensional (Kimball):** Star/snowflake schemas for BI - Use when: Known query patterns, BI dashboards, trend analysis **Normalized (3NF):** Eliminate redundancy for transactional systems - Use when: OLTP systems, frequent up