ai-data-engineering

# AI Data Engineering

## Purpose

Build data infrastructure for AI/ML systems including RAG pipelines, feature stores, and embedding generation. Provides architecture patterns, orchestration workflows, and evaluation metrics for production AI applications.

## When to Use

**Use this skill when:**
- Building RAG (Retrieval-Augmented Generation) pipelines
- Implementing semantic search or vector databases
- Setting up ML feature stores for real-time serving
- Creating embedding generation pipelines
- Evaluating RAG quality with RAGAS metrics
- Orchestrating data workflows for AI systems
- Integrating with frontend skills (ai-chat, search-filter)

**Skip this skill if:**
- Building traditional CRUD applications (use databases-relational)
- Simple key-value storage (use databases-nosql)
- No AI/ML components in the application

## RAG Pipeline Architecture

RAG pipelines have 5 distinct stages. Understanding this architecture is critical for production implementations.

```
┌─────────────────────────────────────────────────────────────┐
│                    RAG Pipeline (5 Stages)                   │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  1. INGESTION → Load documents (PDF, DOCX, Markdown)        │
│  2. INDEXING → Chunk (512 tokens) + Embed + Store           │
│  3. RETRIEVAL → Query embedding + Vector search + Filters   │
│  4. GENERATION → Context injection + LLM streaming          │
│  5. EVALUATION → RAGAS metrics (faithfulness, relevancy)    │
│                                                              │
└─────────────────────────────────────────────────────────────┘
```

**For complete RAG architecture with implementation patterns, see:**
- `references/rag-architecture.md` - Detailed 5-stage breakdown
- `examples/langchain-rag/basic_rag.py` - Working implementation

## Chunking Strategies

Chunking is the most critical decision for RAG quality. Poor chunking breaks r
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details