Back to Skills

datafusion-query-advisor

verified

Reviews SQL queries and DataFrame operations for optimization opportunities including predicate pushdown, partition pruning, column projection, and join ordering. Activates when users write DataFusion queries or experience slow query performance.

View on GitHub

Marketplace

lf-marketplace

EmilLindfors/claude-marketplace

Plugin

rust-data-engineering

development

Repository

EmilLindfors/claude-marketplace
2stars

plugins/rust-data-engineering/skills/datafusion-query-advisor/SKILL.md

Last Verified

January 20, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/EmilLindfors/claude-marketplace/blob/main/plugins/rust-data-engineering/skills/datafusion-query-advisor/SKILL.md -a claude-code --skill datafusion-query-advisor

Installation paths:

Claude
.claude/skills/datafusion-query-advisor/
Powered by add-skill CLI

Instructions

# DataFusion Query Advisor Skill

You are an expert at optimizing DataFusion SQL queries and DataFrame operations. When you detect DataFusion queries, proactively analyze and suggest performance improvements.

## When to Activate

Activate this skill when you notice:
- SQL queries using `ctx.sql(...)` or DataFrame API
- Discussion about slow DataFusion query performance
- Code registering tables or data sources
- Questions about query optimization or EXPLAIN plans
- Mentions of partition pruning, predicate pushdown, or column projection

## Query Optimization Checklist

### 1. Predicate Pushdown

**What to Look For**:
- WHERE clauses that can be pushed to storage layer
- Filters applied after data is loaded

**Good Pattern**:
```sql
SELECT * FROM events
WHERE date = '2024-01-01' AND event_type = 'click'
```

**Bad Pattern**:
```rust
// Reading all data then filtering
let df = ctx.table("events").await?;
let batches = df.collect().await?;
let filtered = batches.filter(/* ... */);  // Too late!
```

**Suggestion**:
```
Your filter is being applied after reading all data. Move filters to SQL for predicate pushdown:

// Good: Filter pushed to Parquet reader
let df = ctx.sql("
    SELECT * FROM events
    WHERE date = '2024-01-01' AND event_type = 'click'
").await?;

This reads only matching row groups based on statistics.
```

### 2. Partition Pruning

**What to Look For**:
- Queries on partitioned tables without partition filters
- Filters on non-partition columns only

**Good Pattern**:
```sql
-- Filters on partition columns (year, month, day)
SELECT * FROM events
WHERE year = 2024 AND month = 1 AND day >= 15
```

**Bad Pattern**:
```sql
-- Scans all partitions
SELECT * FROM events
WHERE timestamp >= '2024-01-15'
```

**Suggestion**:
```
Your query scans all partitions. For Hive-style partitioned data, filter on partition columns:

SELECT * FROM events
WHERE year = 2024 AND month = 1 AND day >= 15
  AND timestamp >= '2024-01-15'

Include both partition column filters 

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
10496 chars