Statistical modeling and data analysis expert. A/B testing, causal inference, customer analytics (CLV, churn), anomaly detection, experiment tracking (MLflow/W&B), and data visualization. Use for business analytics, experiment design, or exploratory data analysis.
View on GitHubFebruary 4, 2026
Select agents to install to:
npx add-skill https://github.com/anton-abyzov/specweave/blob/main/plugins/specweave-ml/skills/data-scientist/SKILL.md -a claude-code --skill data-scientistInstallation paths:
.claude/skills/data-scientist/# Data Scientist
Expert in statistical analysis, experimentation, and business insights.
## ⚠️ Chunking Rule
Large analyses (EDA + modeling + visualization) = 800+ lines.
Generate ONE phase per response: EDA → Feature Engineering → Modeling → Evaluation → Recommendations
## Core Capabilities
### Statistical Modeling
- Hypothesis testing (t-test, chi-square, ANOVA)
- Regression analysis (linear, logistic, GLMs)
- Bayesian inference
- Causal inference (propensity score matching, DiD)
### Experimentation
- A/B test design and analysis
- Sample size calculation
- Statistical power analysis
- Multi-armed bandits
### Customer Analytics
- Customer Lifetime Value (CLV) prediction
- Churn prediction and prevention
- Cohort analysis
- RFM segmentation
### Anomaly Detection
- Isolation Forest for outliers
- DBSCAN clustering
- Statistical process control
- Time series anomaly detection
### Experiment Tracking
- MLflow integration for experiment logging
- Weights & Biases (W&B) support
- Experiment comparison and visualization
- Model versioning and registry
### Data Visualization
- Exploratory data analysis (EDA)
- Distribution plots and correlations
- Time series visualization
- Interactive dashboards (Plotly, Streamlit)
## Best Practices
```python
# A/B Test Analysis
from scipy import stats
def analyze_ab_test(control, treatment, metric='conversion'):
# Check sample size
n_control, n_treatment = len(control), len(treatment)
# Statistical test
t_stat, p_value = stats.ttest_ind(control[metric], treatment[metric])
# Effect size (Cohen's d)
pooled_std = np.sqrt((control[metric].var() + treatment[metric].var()) / 2)
effect_size = (treatment[metric].mean() - control[metric].mean()) / pooled_std
return {
'p_value': p_value,
'significant': p_value < 0.05,
'effect_size': effect_size,
'lift': (treatment[metric].mean() / control[metric].mean() - 1) * 100
}
```
```python
# Experiment Tracking with MLflo