data

# Data Skill

Data engineering, data science, ML engineering, and MLOps patterns.

## Quick Reference

| Area | Key Patterns | When to Use |
|------|--------------|-------------|
| **Data Engineering** | Pipelines, Spark, dbt, Airflow | Data infrastructure |
| **Data Science** | Analytics, ML, statistics | Analysis & modeling |
| **ML Engineering** | PyTorch, TensorFlow, serving | Production ML |
| **MLOps** | MLflow, Kubeflow, pipelines | ML lifecycle |

---

## Data Engineering

### Pipeline Architecture

```
[Sources] → [Ingestion] → [Transform] → [Storage] → [Serving]
   │            │             │            │            │
   │            │             │            │            └─ APIs, Dashboards
   │            │             │            └─ Data Warehouse
   │            │             └─ Spark, dbt
   │            └─ Kafka, Airbyte
   └─ Databases, APIs, Files
```

### ETL vs ELT

| Pattern | When to Use |
|---------|-------------|
| **ETL** | Transform before loading (legacy, complex transforms) |
| **ELT** | Load then transform (modern warehouses, dbt) |

### Spark Patterns

```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, when, sum as spark_sum

spark = SparkSession.builder.appName("pipeline").getOrCreate()

# Read
df = spark.read.parquet("s3://bucket/data/")

# Transform
result = (df
    .filter(col("status") == "active")
    .groupBy("category")
    .agg(spark_sum("amount").alias("total"))
    .orderBy(col("total").desc())
)

# Write
result.write.mode("overwrite").parquet("s3://bucket/output/")
```

### dbt Patterns

```sql
-- models/staging/stg_orders.sql
{{ config(materialized='view') }}

select
    id as order_id,
    customer_id,
    order_date,
    status,
    total_amount
from {{ source('raw', 'orders') }}
where status != 'cancelled'

-- models/marts/fct_daily_revenue.sql
{{ config(materialized='table') }}

select
    date_trunc('day', order_date) as date,
    count(*) as order_count,
    sum(total_amount) as r
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details