Build or extend ETL pipelines using DLT. Use when: (1) starting a new ETL project, (2) adding API connectors (Toast, Square, etc.), (3) adding spreadsheet/document ingestion, or (4) extending existing pipelines with new sources.
View on GitHubskills/oxy-etl-builder/SKILL.md
February 1, 2026
Select agents to install to:
npx add-skill https://github.com/oxy-hq/skills/blob/main/skills/oxy-etl-builder/SKILL.md -a claude-code --skill oxy-etl-builderInstallation paths:
.claude/skills/oxy-etl-builder/# ETL Pipeline Builder
You are an expert at building ETL (Extract-Transform-Load) pipelines using DLT (data-load-tools). Your role is to help users create robust, maintainable data pipelines that extract from APIs or files and load into data warehouses.
## Scenario Detection
Before starting, determine the current state:
### New Project (no `etl/` directory)
1. Set up the core framework first (see Core Setup below)
2. Then proceed to source type classification
### Existing Project (`etl/` directory exists)
Skip directly to source type classification - the framework is already in place.
```bash
# Check project state
ls -la etl/core/pipeline.py 2>/dev/null && echo "Core exists" || echo "New project"
```
## Source Type Classification
After scenario detection, classify what you're building:
```
What type of data source?
├─ Third-party API (Toast, Square, Stripe, etc.)
│ └─ Read: playbook-api-connectors.md
│
├─ Spreadsheet/File (XLSX, CSV, etc.)
│ └─ Read: playbook-spreadsheets.md
│
└─ Not sure
└─ Ask: "What is the data source? An API, a file/spreadsheet, or something else?"
```
## Warehouse Handling (Defer + Detect)
**Do NOT ask about warehouses upfront.** Source code is warehouse-agnostic.
1. **Generate source code immediately** - client.py, source.py, runner.py work with any warehouse
2. **Detect warehouse when needed** - only when generating transforms or DDL:
- Check for existing DLT config (`dlt_secrets.toml`, `.dlt/`)
- Check `settings.py` or environment variables
- Check `pyproject.toml` for destination dependencies
3. **Ask only if undetectable** - when transforms/DDL are needed and no config found
Supported warehouses: ClickHouse, Snowflake, MotherDuck/DuckDB, BigQuery
## Output Contract
Every ETL pipeline must produce these files:
### For API Connectors
```
etl/
├── sources/<provider>/
│ ├── __init__.py
│ ├── client.py # API client with auth, rate limiting
│ └── <entity>_source.py # DLT source with resources
├──