Patterns for data loading, exploration, and statistical analysis
View on GitHubYeachan-Heo/My-Jogyo
gyoshu
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/Yeachan-Heo/My-Jogyo/blob/main/skills/data-analysis/SKILL.md -a claude-code --skill data-analysisInstallation paths:
.claude/skills/data-analysis/# Data Analysis Patterns
## When to Use
Load this skill when working with datasets that require exploration, cleaning, and statistical analysis.
## Data Loading
```python
print("[DATA] Loading dataset")
df = pd.read_csv("data.csv")
print(f"[SHAPE] {df.shape[0]} rows, {df.shape[1]} columns")
print(f"[DTYPE] {dict(df.dtypes)}")
print(f"[MISSING] {df.isnull().sum().to_dict()}")
```
## Exploratory Data Analysis (EDA)
### Descriptive Statistics
```python
print("[STAT] Descriptive statistics:")
print(df.describe())
print(f"[RANGE] {col}: {df[col].min()} to {df[col].max()}")
```
### Distribution Analysis
```python
print("[ANALYSIS] Checking distribution normality")
from scipy import stats
stat, p_value = stats.shapiro(df[col])
print(f"[STAT] Shapiro-Wilk p-value: {p_value:.4f}")
```
### Correlation Analysis
```python
print("[CORR] Correlation matrix:")
print(df.corr())
```
## Statistical Tests
### T-Test
```python
from scipy.stats import ttest_ind
stat, p = ttest_ind(group1, group2)
print(f"[STAT] T-test: t={stat:.3f}, p={p:.4f}")
```
### ANOVA
```python
from scipy.stats import f_oneway
stat, p = f_oneway(group1, group2, group3)
print(f"[STAT] ANOVA: F={stat:.3f}, p={p:.4f}")
```
## Confidence Interval Patterns
### Parametric CI for Means
```python
import numpy as np
from scipy import stats
def mean_ci(data, confidence=0.95):
"""Calculate parametric confidence interval for mean."""
n = len(data)
mean = np.mean(data)
se = stats.sem(data) # Standard error of mean
h = se * stats.t.ppf((1 + confidence) / 2, n - 1)
return mean, mean - h, mean + h
mean, ci_low, ci_high = mean_ci(df[col])
print(f"[STAT:estimate] mean = {mean:.3f}")
print(f"[STAT:ci] 95% CI [{ci_low:.3f}, {ci_high:.3f}]")
```
### Bootstrap CI for Medians/Complex Statistics
```python
import numpy as np
def bootstrap_ci(data, stat_func=np.median, n_bootstrap=10000, confidence=0.95):
"""Calculate bootstrap confidence interval for any statistic."""
boot_stats = []
n