Data engineering, machine learning, AI, and MLOps. From data pipelines to production ML systems and LLM applications.
View on GitHubpluginagentmarketplace/custom-plugin-cloudflare
custom-plugin-cloudflare
skills/data-engineering/SKILL.md
February 1, 2026
Select agents to install to:
npx add-skill https://github.com/pluginagentmarketplace/custom-plugin-cloudflare/blob/main/skills/data-engineering/SKILL.md -a claude-code --skill data-engineeringInstallation paths:
.claude/skills/data-engineering/# Data Engineering Skill
## Quick Reference
| Role | Focus | Timeline | Entry From |
|------|-------|----------|------------|
| **Data Engineer** | Pipelines, Infra | 12-24 mo | Backend Dev |
| **ML Engineer** | Models, Features | 12-24 mo | Data Scientist |
| **AI Engineer** | LLMs, Agents | 6-12 mo | Any Developer |
---
## Learning Paths
### Data Engineer
```
[1] SQL Mastery (4-6 wk)
│ └─ Window functions, CTEs, optimization
│
▼
[2] Python for Data (4-6 wk)
│ └─ Pandas, file formats, scripting
│
▼
[3] ETL/ELT Pipelines (6-8 wk)
│ └─ Extract, transform, load patterns
│
▼
[4] Big Data: Spark (8-12 wk)
│ └─ PySpark, DataFrames, partitioning
│
▼
[5] Data Warehouse (4-6 wk)
│ └─ Star schema, dbt, Snowflake/BQ
│
▼
[6] Orchestration (4-6 wk)
└─ Airflow/Prefect, scheduling, monitoring
```
**2025 Stack:** Python + Spark + Airflow + dbt + Snowflake/BigQuery
---
### ML Engineer
```
[1] Python + NumPy (4-6 wk)
│
▼
[2] Math Foundations (6-8 wk)
│ └─ Linear algebra, calculus, statistics
│
▼
[3] Classical ML (8-12 wk)
│ └─ scikit-learn, XGBoost, evaluation
│
▼
[4] Deep Learning (8-12 wk)
│ └─ PyTorch, CNNs, Transformers
│
▼
[5] MLOps (6-8 wk)
└─ MLflow, model serving, monitoring
```
**2025 Stack:** Python + PyTorch + scikit-learn + MLflow + W&B
---
### AI Engineer (2025 Hot Path)
```
[1] LLM Fundamentals (2-3 wk)
│ └─ Tokens, embeddings, context windows
│
▼
[2] Prompt Engineering (2-3 wk)
│ └─ Few-shot, CoT, structured output
│
▼
[3] RAG Systems (3-4 wk)
│ └─ Embeddings, vector DBs, retrieval
│
▼
[4] AI Agents (4-6 wk)
│ └─ Tool calling, agent loops, memory
│
▼
[5] Production Deploy (ongoing)
└─ Evaluation, guardrails, monitoring
```
**2025 Stack:** Python + LangChain/LlamaIndex + OpenAI/Anthropic + ChromaDB
---
## 2025 Tool Matrix
### Data Processing
| Tool | Scale | Use Case |
|------|-------|----------|
| **Pandas** | <10GB | Prototyping, small data |
| **Polars** | <100GB | Fast local processing |
|