Comprehensive feature engineering for ML pipelines: data quality assessment, feature creation, selection, transformation, and encoding. Activates for "feature engineering", "create features", "feature selection", "data preprocessing", "handle missing values", "encode categorical", "scale features", "feature importance". Ensures features are production-ready with automated validation, documentation, and integration with SpecWeave increments.
View on GitHubanton-abyzov/specweave
sw-ml
January 25, 2026
Select agents to install to:
npx add-skill https://github.com/anton-abyzov/specweave/blob/main/plugins/specweave-ml/skills/feature-engineer/SKILL.md -a claude-code --skill feature-engineerInstallation paths:
.claude/skills/feature-engineer/# Feature Engineer
## Overview
Feature engineering often makes the difference between mediocre and excellent ML models. This skill transforms raw data into model-ready features through systematic data quality assessment, feature creation, selection, and transformation—all integrated with SpecWeave's increment workflow.
## The Feature Engineering Pipeline
### Phase 1: Data Quality Assessment
**Before creating features, understand your data**:
```python
from specweave import DataQualityReport
# Automated data quality check
report = DataQualityReport(df, increment="0042")
# Generates:
# - Missing value analysis
# - Outlier detection
# - Data type validation
# - Distribution analysis
# - Correlation matrix
# - Duplicate detection
```
**Quality Report Output**:
```markdown
# Data Quality Report
## Dataset Overview
- Rows: 100,000
- Columns: 45
- Memory: 34.2 MB
## Missing Values
| Column | Missing | Percentage |
|-----------------|---------|------------|
| email | 15,234 | 15.2% |
| phone | 8,901 | 8.9% |
| purchase_date | 0 | 0.0% |
## Outliers Detected
- transaction_amount: 234 outliers (>3 std dev)
- user_age: 12 outliers (<18 or >100)
## Data Type Issues
- user_id: Stored as float, should be int
- date_joined: Stored as string, should be datetime
## Recommendations
1. Impute email/phone or create "missing" indicator features
2. Cap/remove outliers in transaction_amount
3. Convert data types for efficiency
```
### Phase 2: Feature Creation
**Create features from domain knowledge**:
```python
from specweave import FeatureCreator
creator = FeatureCreator(df, increment="0042")
# Temporal features (from datetime)
creator.add_temporal_features(
date_column="purchase_date",
features=["hour", "day_of_week", "month", "is_weekend", "is_holiday"]
)
# Aggregation features (user behavior)
creator.add_aggregation_features(
group_by="user_id",
target="purchase_amount",
aggs=["mean", "st