Use when: "data pipeline", "ETL", "Spark", "dbt", "Airflow", "data warehouse", "analytics", "machine learning", "ML", "model", "PyTorch", "TensorFlow", "MLOps", "MLflow", "Kubeflow", "feature engineering", "A/B testing".
View on GitHubboshu2/agentops
domain-kit
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/boshu2/agentops/blob/main/plugins/domain-kit/skills/data/SKILL.md -a claude-code --skill dataInstallation paths:
.claude/skills/data/# Data Skill
Data engineering, data science, ML engineering, and MLOps patterns.
## Quick Reference
| Area | Key Patterns | When to Use |
|------|--------------|-------------|
| **Data Engineering** | Pipelines, Spark, dbt, Airflow | Data infrastructure |
| **Data Science** | Analytics, ML, statistics | Analysis & modeling |
| **ML Engineering** | PyTorch, TensorFlow, serving | Production ML |
| **MLOps** | MLflow, Kubeflow, pipelines | ML lifecycle |
---
## Data Engineering
### Pipeline Architecture
```
[Sources] → [Ingestion] → [Transform] → [Storage] → [Serving]
│ │ │ │ │
│ │ │ │ └─ APIs, Dashboards
│ │ │ └─ Data Warehouse
│ │ └─ Spark, dbt
│ └─ Kafka, Airbyte
└─ Databases, APIs, Files
```
### ETL vs ELT
| Pattern | When to Use |
|---------|-------------|
| **ETL** | Transform before loading (legacy, complex transforms) |
| **ELT** | Load then transform (modern warehouses, dbt) |
### Spark Patterns
```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, sum as spark_sum
spark = SparkSession.builder.appName("pipeline").getOrCreate()
# Read
df = spark.read.parquet("s3://bucket/data/")
# Transform
result = (df
.filter(col("status") == "active")
.groupBy("category")
.agg(spark_sum("amount").alias("total"))
.orderBy(col("total").desc())
)
# Write
result.write.mode("overwrite").parquet("s3://bucket/output/")
```
### dbt Patterns
```sql
-- models/staging/stg_orders.sql
{{ config(materialized='view') }}
select
id as order_id,
customer_id,
order_date,
status,
total_amount
from {{ source('raw', 'orders') }}
where status != 'cancelled'
-- models/marts/fct_daily_revenue.sql
{{ config(materialized='table') }}
select
date_trunc('day', order_date) as date,
count(*) as order_count,
sum(total_amount) as r