Back to Skills

architecting-data

verified

Strategic guidance for designing modern data platforms, covering storage paradigms (data lake, warehouse, lakehouse), modeling approaches (dimensional, normalized, data vault, wide tables), data mesh principles, and medallion architecture patterns. Use when architecting data platforms, choosing between centralized vs decentralized patterns, selecting table formats (Iceberg, Delta Lake), or designing data governance frameworks.

View on GitHub

Marketplace

ai-design-components

ancoleman/ai-design-components

Plugin

backend-ai-skills

Repository

ancoleman/ai-design-components
153stars

skills/architecting-data/SKILL.md

Last Verified

February 1, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/ancoleman/ai-design-components/blob/main/skills/architecting-data/SKILL.md -a claude-code --skill architecting-data

Installation paths:

Claude
.claude/skills/architecting-data/
Powered by add-skill CLI

Instructions

# Data Architecture

## Purpose

Guide architects and platform engineers through strategic data architecture decisions for modern cloud-native data platforms.

## When to Use This Skill

Invoke this skill when:
- Designing a new data platform or modernizing legacy systems
- Choosing between data lake, data warehouse, or data lakehouse
- Deciding on data modeling approaches (dimensional, normalized, data vault, wide tables)
- Evaluating centralized vs data mesh architecture
- Selecting open table formats (Apache Iceberg, Delta Lake, Apache Hudi)
- Designing medallion architecture (bronze, silver, gold layers)
- Implementing data governance and cataloging

## Core Concepts

### 1. Storage Paradigms

Three primary patterns for analytical data storage:

**Data Lake:** Centralized repository for raw data at scale
- Schema-on-read, cost-optimized ($0.02-0.03/GB/month)
- Use when: Diverse data sources, exploratory analytics, ML/AI training data

**Data Warehouse:** Structured repository optimized for BI
- Schema-on-write, ACID transactions, fast queries
- Use when: Known BI requirements, strong governance needed

**Data Lakehouse:** Hybrid combining lake flexibility with warehouse reliability
- Open table formats (Iceberg, Delta Lake), ACID on object storage
- Use when: Mixed BI + ML workloads, cost optimization (60-80% cheaper than warehouse)

**Decision Framework:**
- BI/Reporting only + Known queries → Data Warehouse
- ML/AI primary + Raw data needed → Data Lake or Lakehouse
- Mixed BI + ML + Cost optimization → Data Lakehouse (recommended)
- Exploratory/Unknown use cases → Data Lake

For detailed comparison, see [references/storage-paradigms.md](references/storage-paradigms.md).

### 2. Data Modeling Approaches

Four primary modeling patterns:

**Dimensional (Kimball):** Star/snowflake schemas for BI
- Use when: Known query patterns, BI dashboards, trend analysis

**Normalized (3NF):** Eliminate redundancy for transactional systems
- Use when: OLTP systems, frequent up

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
13754 chars