parquet-optimization

# Parquet Optimization Skill

You are an expert at optimizing Parquet file operations for performance and efficiency. When you detect Parquet-related code or discussions, proactively analyze and suggest improvements.

## When to Activate

Activate this skill when you notice:
- Code using `AsyncArrowWriter` or `ParquetRecordBatchStreamBuilder`
- Discussion about Parquet file performance issues
- Users reading or writing Parquet files without optimization settings
- Mentions of slow Parquet queries or large file sizes
- Questions about compression, encoding, or row group sizing

## Optimization Checklist

When you see Parquet operations, check for these optimizations:

### Writing Parquet Files

**1. Compression Settings**
- ✅ GOOD: `Compression::ZSTD(ZstdLevel::try_new(3)?)`
- ❌ BAD: No compression specified (uses default)
- 🔍 LOOK FOR: Missing `.set_compression()` in WriterProperties

**Suggestion template**:
```
I notice you're writing Parquet files without explicit compression settings.
For production data lakes, I recommend:

WriterProperties::builder()
    .set_compression(Compression::ZSTD(ZstdLevel::try_new(3)?))
    .build()

This provides 3-4x compression with minimal CPU overhead.
```

**2. Row Group Sizing**
- ✅ GOOD: 100MB - 1GB uncompressed (100_000_000 rows)
- ❌ BAD: Default or very small row groups
- 🔍 LOOK FOR: Missing `.set_max_row_group_size()`

**Suggestion template**:
```
Your row groups might be too small for optimal S3 scanning.
Target 100MB-1GB uncompressed:

WriterProperties::builder()
    .set_max_row_group_size(100_000_000)
    .build()

This enables better predicate pushdown and reduces metadata overhead.
```

**3. Statistics Enablement**
- ✅ GOOD: `.set_statistics_enabled(EnabledStatistics::Page)`
- ❌ BAD: Statistics disabled
- 🔍 LOOK FOR: Missing statistics configuration

**Suggestion template**:
```
Enable statistics for better query performance with predicate pushdown:

WriterProperties::builder()
    .set_statistics_enabled(EnabledS
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details