Production ETL patterns orchestrator. Routes to core reliability patterns and incremental load strategies.
View on GitHubmajesticlabs-dev/majestic-marketplace
majestic-data
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/majesticlabs-dev/majestic-marketplace/blob/main/plugins/majestic-data/skills/etl-patterns/SKILL.md -a claude-code --skill etl-patternsInstallation paths:
.claude/skills/etl-patterns/# ETL Patterns
Orchestrator for production-grade Extract-Transform-Load patterns.
## Skill Routing
| Need | Skill | Content |
|------|-------|---------|
| Reliability patterns | `etl-core-patterns` | Idempotency, checkpointing, error handling, chunking, retry, logging |
| Load strategies | `etl-incremental-patterns` | Backfill, timestamp-based, CDC, pipeline orchestration |
## Pattern Selection Guide
### By Reliability Need
| Need | Pattern | Skill |
|------|---------|-------|
| Repeatable runs | Idempotency | `etl-core-patterns` |
| Resume after failure | Checkpointing | `etl-core-patterns` |
| Handle bad records | Error handling + DLQ | `etl-core-patterns` |
| Memory management | Chunked processing | `etl-core-patterns` |
| Network resilience | Retry with backoff | `etl-core-patterns` |
| Observability | Structured logging | `etl-core-patterns` |
### By Load Strategy
| Scenario | Pattern | Skill |
|----------|---------|-------|
| Small tables (<100K) | Full refresh | `etl-incremental-patterns` |
| Large tables | Timestamp incremental | `etl-incremental-patterns` |
| Real-time sync | CDC events | `etl-incremental-patterns` |
| Historical migration | Parallel backfill | `etl-incremental-patterns` |
| Zero-downtime refresh | Swap pattern | `etl-incremental-patterns` |
| Multi-step pipelines | Pipeline orchestration | `etl-incremental-patterns` |
## Quick Reference
### Idempotency Options
```python
# Small datasets: Delete-then-insert
# Large datasets: UPSERT on conflict
# Change detection: Row hash comparison
```
### Load Strategy Decision
```
Is table < 100K rows?
→ Full refresh
Has reliable timestamp column?
→ Timestamp incremental
Source supports CDC?
→ CDC event processing
Need zero downtime?
→ Swap pattern (temp table → rename)
One-time historical load?
→ Parallel backfill with date ranges
```
## Common Pipeline Structure
```python
# 1. Setup
checkpoint = Checkpoint('.etl_checkpoint.json')
processor = ETLProcessor()
# 2. Extract (wi