Proactively analyzes Parquet file operations and suggests optimization improvements for compression, encoding, row group sizing, and statistics. Activates when users are reading or writing Parquet files or discussing Parquet performance.
View on GitHubEmilLindfors/claude-marketplace
rust-data-engineering
plugins/rust-data-engineering/skills/parquet-optimization/SKILL.md
January 20, 2026
Select agents to install to:
npx add-skill https://github.com/EmilLindfors/claude-marketplace/blob/main/plugins/rust-data-engineering/skills/parquet-optimization/SKILL.md -a claude-code --skill parquet-optimizationInstallation paths:
.claude/skills/parquet-optimization/# Parquet Optimization Skill
You are an expert at optimizing Parquet file operations for performance and efficiency. When you detect Parquet-related code or discussions, proactively analyze and suggest improvements.
## When to Activate
Activate this skill when you notice:
- Code using `AsyncArrowWriter` or `ParquetRecordBatchStreamBuilder`
- Discussion about Parquet file performance issues
- Users reading or writing Parquet files without optimization settings
- Mentions of slow Parquet queries or large file sizes
- Questions about compression, encoding, or row group sizing
## Optimization Checklist
When you see Parquet operations, check for these optimizations:
### Writing Parquet Files
**1. Compression Settings**
- โ
GOOD: `Compression::ZSTD(ZstdLevel::try_new(3)?)`
- โ BAD: No compression specified (uses default)
- ๐ LOOK FOR: Missing `.set_compression()` in WriterProperties
**Suggestion template**:
```
I notice you're writing Parquet files without explicit compression settings.
For production data lakes, I recommend:
WriterProperties::builder()
.set_compression(Compression::ZSTD(ZstdLevel::try_new(3)?))
.build()
This provides 3-4x compression with minimal CPU overhead.
```
**2. Row Group Sizing**
- โ
GOOD: 100MB - 1GB uncompressed (100_000_000 rows)
- โ BAD: Default or very small row groups
- ๐ LOOK FOR: Missing `.set_max_row_group_size()`
**Suggestion template**:
```
Your row groups might be too small for optimal S3 scanning.
Target 100MB-1GB uncompressed:
WriterProperties::builder()
.set_max_row_group_size(100_000_000)
.build()
This enables better predicate pushdown and reduces metadata overhead.
```
**3. Statistics Enablement**
- โ
GOOD: `.set_statistics_enabled(EnabledStatistics::Page)`
- โ BAD: Statistics disabled
- ๐ LOOK FOR: Missing statistics configuration
**Suggestion template**:
```
Enable statistics for better query performance with predicate pushdown:
WriterProperties::builder()
.set_statistics_enabled(EnabledS