Enforces baseline comparisons, cross-validation, interpretation, and leakage prevention for ML pipelines
View on GitHubYeachan-Heo/My-Jogyo
gyoshu
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/Yeachan-Heo/My-Jogyo/blob/main/skills/ml-rigor/SKILL.md -a claude-code --skill ml-rigorInstallation paths:
.claude/skills/ml-rigor/# Machine Learning Rigor Patterns
## When to Use
Load this skill when building machine learning models. Every ML pipeline must demonstrate:
- **Baseline comparison**: Beat a dummy model before claiming success
- **Cross-validation**: Report variance, not just a single score
- **Interpretation**: Explain what the model learned
- **Leakage prevention**: Ensure no future information leaks into training
**Quality Gate**: ML findings without baseline comparison or cross-validation are marked as "Exploratory" in reports.
---
## 1. Baseline Requirements
**Every model must be compared to baselines.** A model that can't beat a dummy classifier isn't learning anything useful.
### Always Compare To:
1. **DummyClassifier/DummyRegressor** - The absolute minimum bar
2. **Simple linear model** - LogisticRegression or LinearRegression
3. **Domain heuristic** (if available) - Rule-based approach
### Baseline Code Template
```python
from sklearn.dummy import DummyClassifier, DummyRegressor
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.model_selection import cross_val_score
import numpy as np
print("[DECISION] Establishing baselines before training complex models")
# Classification baselines
dummy_clf = DummyClassifier(strategy='most_frequent')
dummy_scores = cross_val_score(dummy_clf, X_train, y_train, cv=5, scoring='accuracy')
print(f"[METRIC:baseline_accuracy] {dummy_scores.mean():.3f} (majority class)")
print(f"[METRIC:baseline_accuracy_std] {dummy_scores.std():.3f}")
# Simple linear baseline
lr = LogisticRegression(max_iter=1000, random_state=42)
lr_scores = cross_val_score(lr, X_train, y_train, cv=5, scoring='accuracy')
print(f"[METRIC:linear_baseline_accuracy] {lr_scores.mean():.3f}")
print(f"[METRIC:linear_baseline_accuracy_std] {lr_scores.std():.3f}")
# For regression tasks
dummy_reg = DummyRegressor(strategy='mean')
dummy_rmse = cross_val_score(dummy_reg, X_train, y_train, cv=5,
scoring='neg_root