feature-engineer

# Feature Engineer

## Overview

Feature engineering often makes the difference between mediocre and excellent ML models. This skill transforms raw data into model-ready features through systematic data quality assessment, feature creation, selection, and transformation—all integrated with SpecWeave's increment workflow.

## The Feature Engineering Pipeline

### Phase 1: Data Quality Assessment

**Before creating features, understand your data**:

```python
from specweave import DataQualityReport

# Automated data quality check
report = DataQualityReport(df, increment="0042")

# Generates:
# - Missing value analysis
# - Outlier detection
# - Data type validation
# - Distribution analysis
# - Correlation matrix
# - Duplicate detection
```

**Quality Report Output**:
```markdown
# Data Quality Report

## Dataset Overview
- Rows: 100,000
- Columns: 45
- Memory: 34.2 MB

## Missing Values
| Column          | Missing | Percentage |
|-----------------|---------|------------|
| email           | 15,234  | 15.2%      |
| phone           | 8,901   | 8.9%       |
| purchase_date   | 0       | 0.0%       |

## Outliers Detected
- transaction_amount: 234 outliers (>3 std dev)
- user_age: 12 outliers (<18 or >100)

## Data Type Issues
- user_id: Stored as float, should be int
- date_joined: Stored as string, should be datetime

## Recommendations
1. Impute email/phone or create "missing" indicator features
2. Cap/remove outliers in transaction_amount
3. Convert data types for efficiency
```

### Phase 2: Feature Creation

**Create features from domain knowledge**:

```python
from specweave import FeatureCreator

creator = FeatureCreator(df, increment="0042")

# Temporal features (from datetime)
creator.add_temporal_features(
    date_column="purchase_date",
    features=["hour", "day_of_week", "month", "is_weekend", "is_holiday"]
)

# Aggregation features (user behavior)
creator.add_aggregation_features(
    group_by="user_id",
    target="purchase_amount",
    aggs=["mean", "st
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details