Back to Skills

data

verified

Use when: "data pipeline", "ETL", "Spark", "dbt", "Airflow", "data warehouse", "analytics", "machine learning", "ML", "model", "PyTorch", "TensorFlow", "MLOps", "MLflow", "Kubeflow", "feature engineering", "A/B testing".

View on GitHub

Marketplace

agentops-marketplace

boshu2/agentops

Plugin

domain-kit

development

Repository

boshu2/agentops
6stars

plugins/domain-kit/skills/data/SKILL.md

Last Verified

January 24, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/boshu2/agentops/blob/main/plugins/domain-kit/skills/data/SKILL.md -a claude-code --skill data

Installation paths:

Claude
.claude/skills/data/
Powered by add-skill CLI

Instructions

# Data Skill

Data engineering, data science, ML engineering, and MLOps patterns.

## Quick Reference

| Area | Key Patterns | When to Use |
|------|--------------|-------------|
| **Data Engineering** | Pipelines, Spark, dbt, Airflow | Data infrastructure |
| **Data Science** | Analytics, ML, statistics | Analysis & modeling |
| **ML Engineering** | PyTorch, TensorFlow, serving | Production ML |
| **MLOps** | MLflow, Kubeflow, pipelines | ML lifecycle |

---

## Data Engineering

### Pipeline Architecture

```
[Sources] → [Ingestion] → [Transform] → [Storage] → [Serving]
   │            │             │            │            │
   │            │             │            │            └─ APIs, Dashboards
   │            │             │            └─ Data Warehouse
   │            │             └─ Spark, dbt
   │            └─ Kafka, Airbyte
   └─ Databases, APIs, Files
```

### ETL vs ELT

| Pattern | When to Use |
|---------|-------------|
| **ETL** | Transform before loading (legacy, complex transforms) |
| **ELT** | Load then transform (modern warehouses, dbt) |

### Spark Patterns

```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, sum as spark_sum

spark = SparkSession.builder.appName("pipeline").getOrCreate()

# Read
df = spark.read.parquet("s3://bucket/data/")

# Transform
result = (df
    .filter(col("status") == "active")
    .groupBy("category")
    .agg(spark_sum("amount").alias("total"))
    .orderBy(col("total").desc())
)

# Write
result.write.mode("overwrite").parquet("s3://bucket/output/")
```

### dbt Patterns

```sql
-- models/staging/stg_orders.sql
{{ config(materialized='view') }}

select
    id as order_id,
    customer_id,
    order_date,
    status,
    total_amount
from {{ source('raw', 'orders') }}
where status != 'cancelled'

-- models/marts/fct_daily_revenue.sql
{{ config(materialized='table') }}

select
    date_trunc('day', order_date) as date,
    count(*) as order_count,
    sum(total_amount) as r

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
8581 chars