Back to Skills

pandas-coder

verified

DataFrame manipulation with chunked processing, memory optimization, and vectorized operations.

View on GitHub

Marketplace

majestic-marketplace

majesticlabs-dev/majestic-marketplace

Plugin

majestic-data

Repository

majesticlabs-dev/majestic-marketplace
19stars

plugins/majestic-data/skills/pandas-coder/SKILL.md

Last Verified

January 24, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/majesticlabs-dev/majestic-marketplace/blob/main/plugins/majestic-data/skills/pandas-coder/SKILL.md -a claude-code --skill pandas-coder

Installation paths:

Claude
.claude/skills/pandas-coder/
Powered by add-skill CLI

Instructions

# Pandas-Coder

Expert in pandas DataFrame manipulation with focus on production-ready patterns for large datasets.

## Memory-Efficient Reading

```python
# Chunked CSV reading - default for files > 100MB
chunks = pd.read_csv('large.csv', chunksize=50_000)
for chunk in chunks:
    process(chunk)

# Read only needed columns
df = pd.read_csv('data.csv', usecols=['id', 'name', 'value'])

# Optimize dtypes on load
df = pd.read_csv('data.csv', dtype={
    'id': 'int32',           # not int64
    'category': 'category',  # not object
    'flag': 'bool'
})
```

## Category Type for Repeated Strings

```python
# BEFORE: 800MB with object dtype
df['status'] = df['status'].astype('category')  # AFTER: 50MB

# Set categories explicitly for consistency across files
df['status'] = pd.Categorical(
    df['status'],
    categories=['pending', 'active', 'completed', 'cancelled']
)
```

## Vectorized Operations Over Loops

```python
# BAD - iterating rows
for idx, row in df.iterrows():
    df.loc[idx, 'total'] = row['price'] * row['qty']

# GOOD - vectorized
df['total'] = df['price'] * df['qty']

# BAD - apply with Python function
df['clean'] = df['name'].apply(lambda x: x.strip().lower())

# GOOD - vectorized string methods
df['clean'] = df['name'].str.strip().str.lower()
```

## Conditional Assignment

```python
# Use np.where for simple conditions
df['tier'] = np.where(df['revenue'] > 1000, 'premium', 'standard')

# Use np.select for multiple conditions
conditions = [
    df['score'] >= 90,
    df['score'] >= 70,
    df['score'] >= 50
]
choices = ['A', 'B', 'C']
df['grade'] = np.select(conditions, choices, default='F')
```

## GroupBy Optimizations

```python
# Named aggregations (pandas 2.0+)
result = df.groupby('category').agg(
    total_sales=('sales', 'sum'),
    avg_price=('price', 'mean'),
    count=('id', 'count')
)

# Transform for broadcasting back to original shape
df['pct_of_group'] = df.groupby('category')['value'].transform(
    lambda x: x / x.sum()
)
```

## Index

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
4253 chars