Machine learning pipeline builder for end-to-end ML systems. Covers feature engineering, model training, evaluation, hyperparameter tuning, AutoML, and explainability (SHAP/LIME). Use for ML pipelines, model training, or building production ML systems.
View on GitHubFebruary 4, 2026
Select agents to install to:
npx add-skill https://github.com/anton-abyzov/specweave/blob/main/plugins/specweave-ml/skills/ml-engineer/SKILL.md -a claude-code --skill ml-engineerInstallation paths:
.claude/skills/ml-engineer/# ML Engineer
Expert ML system builder covering the complete ML lifecycle.
## ⚠️ Chunking Rule
Large ML pipelines = 1000+ lines. Generate ONE stage per response:
1. Data/EDA → 2. Features → 3. Training → 4. Evaluation → 5. Deployment
## Core Capabilities
### Feature Engineering
- Feature extraction, selection, and transformation
- Feature importance analysis (permutation, SHAP)
- Feature store integration patterns
- Automated feature generation
### Model Training
- Baseline comparison (always start with baseline!)
- Cross-validation (k-fold, stratified, time-based)
- Hyperparameter tuning (Grid, Random, Bayesian)
- AutoML integration (TPOT, Auto-sklearn, H2O)
### Model Evaluation
- Classification: accuracy, precision, recall, F1, AUC-ROC
- Regression: RMSE, MAE, R², MAPE
- Ranking: NDCG, MAP, MRR
- Custom business metrics
### Explainability
- SHAP values for feature importance
- LIME for local explanations
- Partial dependence plots
- Model-agnostic interpretability
## Best Practices
```python
# 1. Always establish baseline first
baseline = train_baseline(strategies=["random", "popularity", "rule-based"])
# New model must beat baseline by significant margin
# 2. Use proper cross-validation
cv_scores = cross_val_score(model, X, y, cv=5, scoring='f1_macro')
print(f"CV Score: {cv_scores.mean():.3f} ± {cv_scores.std():.3f}")
# 3. Track everything
mlflow.log_params(model.get_params())
mlflow.log_metrics({"accuracy": acc, "f1": f1})
mlflow.log_artifact("model.pkl")
# 4. Add explainability
import shap
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
```
## Framework Support
- **scikit-learn**: RandomForest, XGBoost, LightGBM
- **PyTorch**: Neural networks, custom architectures
- **TensorFlow/Keras**: Deep learning models
- **AutoML**: TPOT, Auto-sklearn, H2O AutoML
## When to Use
- Building ML features end-to-end
- Feature engineering and selection
- Model training and evaluation
- Hyperparameter optimization
- Model explai