Back to Skills

feature-engineer

verified

Comprehensive feature engineering for ML pipelines: data quality assessment, feature creation, selection, transformation, and encoding. Activates for "feature engineering", "create features", "feature selection", "data preprocessing", "handle missing values", "encode categorical", "scale features", "feature importance". Ensures features are production-ready with automated validation, documentation, and integration with SpecWeave increments.

View on GitHub

Marketplace

specweave

anton-abyzov/specweave

Plugin

sw-ml

development

Repository

anton-abyzov/specweave
27stars

plugins/specweave-ml/skills/feature-engineer/SKILL.md

Last Verified

January 25, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/anton-abyzov/specweave/blob/main/plugins/specweave-ml/skills/feature-engineer/SKILL.md -a claude-code --skill feature-engineer

Installation paths:

Claude
.claude/skills/feature-engineer/
Powered by add-skill CLI

Instructions

# Feature Engineer

## Overview

Feature engineering often makes the difference between mediocre and excellent ML models. This skill transforms raw data into model-ready features through systematic data quality assessment, feature creation, selection, and transformation—all integrated with SpecWeave's increment workflow.

## The Feature Engineering Pipeline

### Phase 1: Data Quality Assessment

**Before creating features, understand your data**:

```python
from specweave import DataQualityReport

# Automated data quality check
report = DataQualityReport(df, increment="0042")

# Generates:
# - Missing value analysis
# - Outlier detection
# - Data type validation
# - Distribution analysis
# - Correlation matrix
# - Duplicate detection
```

**Quality Report Output**:
```markdown
# Data Quality Report

## Dataset Overview
- Rows: 100,000
- Columns: 45
- Memory: 34.2 MB

## Missing Values
| Column          | Missing | Percentage |
|-----------------|---------|------------|
| email           | 15,234  | 15.2%      |
| phone           | 8,901   | 8.9%       |
| purchase_date   | 0       | 0.0%       |

## Outliers Detected
- transaction_amount: 234 outliers (>3 std dev)
- user_age: 12 outliers (<18 or >100)

## Data Type Issues
- user_id: Stored as float, should be int
- date_joined: Stored as string, should be datetime

## Recommendations
1. Impute email/phone or create "missing" indicator features
2. Cap/remove outliers in transaction_amount
3. Convert data types for efficiency
```

### Phase 2: Feature Creation

**Create features from domain knowledge**:

```python
from specweave import FeatureCreator

creator = FeatureCreator(df, increment="0042")

# Temporal features (from datetime)
creator.add_temporal_features(
    date_column="purchase_date",
    features=["hour", "day_of_week", "month", "is_weekend", "is_holiday"]
)

# Aggregation features (user behavior)
creator.add_aggregation_features(
    group_by="user_id",
    target="purchase_amount",
    aggs=["mean", "st

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
13971 chars