Back to Skills

data-analysis

verified

Patterns for data loading, exploration, and statistical analysis

View on GitHub

Marketplace

gyoshu

Yeachan-Heo/My-Jogyo

Plugin

gyoshu

productivity

Repository

Yeachan-Heo/My-Jogyo
126stars

skills/data-analysis/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/Yeachan-Heo/My-Jogyo/blob/main/skills/data-analysis/SKILL.md -a claude-code --skill data-analysis

Installation paths:

Claude
.claude/skills/data-analysis/
Powered by add-skill CLI

Instructions

# Data Analysis Patterns

## When to Use
Load this skill when working with datasets that require exploration, cleaning, and statistical analysis.

## Data Loading
```python
print("[DATA] Loading dataset")
df = pd.read_csv("data.csv")
print(f"[SHAPE] {df.shape[0]} rows, {df.shape[1]} columns")
print(f"[DTYPE] {dict(df.dtypes)}")
print(f"[MISSING] {df.isnull().sum().to_dict()}")
```

## Exploratory Data Analysis (EDA)

### Descriptive Statistics
```python
print("[STAT] Descriptive statistics:")
print(df.describe())

print(f"[RANGE] {col}: {df[col].min()} to {df[col].max()}")
```

### Distribution Analysis
```python
print("[ANALYSIS] Checking distribution normality")
from scipy import stats
stat, p_value = stats.shapiro(df[col])
print(f"[STAT] Shapiro-Wilk p-value: {p_value:.4f}")
```

### Correlation Analysis
```python
print("[CORR] Correlation matrix:")
print(df.corr())
```

## Statistical Tests

### T-Test
```python
from scipy.stats import ttest_ind
stat, p = ttest_ind(group1, group2)
print(f"[STAT] T-test: t={stat:.3f}, p={p:.4f}")
```

### ANOVA
```python
from scipy.stats import f_oneway
stat, p = f_oneway(group1, group2, group3)
print(f"[STAT] ANOVA: F={stat:.3f}, p={p:.4f}")
```

## Confidence Interval Patterns

### Parametric CI for Means
```python
import numpy as np
from scipy import stats

def mean_ci(data, confidence=0.95):
    """Calculate parametric confidence interval for mean."""
    n = len(data)
    mean = np.mean(data)
    se = stats.sem(data)  # Standard error of mean
    h = se * stats.t.ppf((1 + confidence) / 2, n - 1)
    return mean, mean - h, mean + h

mean, ci_low, ci_high = mean_ci(df[col])
print(f"[STAT:estimate] mean = {mean:.3f}")
print(f"[STAT:ci] 95% CI [{ci_low:.3f}, {ci_high:.3f}]")
```

### Bootstrap CI for Medians/Complex Statistics
```python
import numpy as np

def bootstrap_ci(data, stat_func=np.median, n_bootstrap=10000, confidence=0.95):
    """Calculate bootstrap confidence interval for any statistic."""
    boot_stats = []
    n

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
10972 chars