Back to Skills

parquet-optimization

verified

Proactively analyzes Parquet file operations and suggests optimization improvements for compression, encoding, row group sizing, and statistics. Activates when users are reading or writing Parquet files or discussing Parquet performance.

View on GitHub

Marketplace

lf-marketplace

EmilLindfors/claude-marketplace

Plugin

rust-data-engineering

development

Repository

EmilLindfors/claude-marketplace
2stars

plugins/rust-data-engineering/skills/parquet-optimization/SKILL.md

Last Verified

January 20, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/EmilLindfors/claude-marketplace/blob/main/plugins/rust-data-engineering/skills/parquet-optimization/SKILL.md -a claude-code --skill parquet-optimization

Installation paths:

Claude
.claude/skills/parquet-optimization/
Powered by add-skill CLI

Instructions

# Parquet Optimization Skill

You are an expert at optimizing Parquet file operations for performance and efficiency. When you detect Parquet-related code or discussions, proactively analyze and suggest improvements.

## When to Activate

Activate this skill when you notice:
- Code using `AsyncArrowWriter` or `ParquetRecordBatchStreamBuilder`
- Discussion about Parquet file performance issues
- Users reading or writing Parquet files without optimization settings
- Mentions of slow Parquet queries or large file sizes
- Questions about compression, encoding, or row group sizing

## Optimization Checklist

When you see Parquet operations, check for these optimizations:

### Writing Parquet Files

**1. Compression Settings**
- โœ… GOOD: `Compression::ZSTD(ZstdLevel::try_new(3)?)`
- โŒ BAD: No compression specified (uses default)
- ๐Ÿ” LOOK FOR: Missing `.set_compression()` in WriterProperties

**Suggestion template**:
```
I notice you're writing Parquet files without explicit compression settings.
For production data lakes, I recommend:

WriterProperties::builder()
    .set_compression(Compression::ZSTD(ZstdLevel::try_new(3)?))
    .build()

This provides 3-4x compression with minimal CPU overhead.
```

**2. Row Group Sizing**
- โœ… GOOD: 100MB - 1GB uncompressed (100_000_000 rows)
- โŒ BAD: Default or very small row groups
- ๐Ÿ” LOOK FOR: Missing `.set_max_row_group_size()`

**Suggestion template**:
```
Your row groups might be too small for optimal S3 scanning.
Target 100MB-1GB uncompressed:

WriterProperties::builder()
    .set_max_row_group_size(100_000_000)
    .build()

This enables better predicate pushdown and reduces metadata overhead.
```

**3. Statistics Enablement**
- โœ… GOOD: `.set_statistics_enabled(EnabledStatistics::Page)`
- โŒ BAD: Statistics disabled
- ๐Ÿ” LOOK FOR: Missing statistics configuration

**Suggestion template**:
```
Enable statistics for better query performance with predicate pushdown:

WriterProperties::builder()
    .set_statistics_enabled(EnabledS

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
7950 chars