Back to Skills

nlp-pipeline-builder

verified

Natural language processing ML pipelines for text classification, NER, sentiment analysis, text generation, and embeddings. Activates for "nlp", "text classification", "sentiment analysis", "named entity recognition", "BERT", "transformers", "text preprocessing", "tokenization", "word embeddings". Builds NLP pipelines with transformers, integrated with SpecWeave increments.

View on GitHub

Marketplace

specweave

anton-abyzov/specweave

Plugin

sw-ml

development

Repository

anton-abyzov/specweave
27stars

plugins/specweave-ml/skills/nlp-pipeline-builder/SKILL.md

Last Verified

January 25, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/anton-abyzov/specweave/blob/main/plugins/specweave-ml/skills/nlp-pipeline-builder/SKILL.md -a claude-code --skill nlp-pipeline-builder

Installation paths:

Claude
.claude/skills/nlp-pipeline-builder/
Powered by add-skill CLI

Instructions

# NLP Pipeline Builder

## Overview

Specialized ML pipelines for natural language processing. Handles text preprocessing, tokenization, transformer models (BERT, RoBERTa, GPT), fine-tuning, and deployment for production NLP systems.

## NLP Tasks Supported

### 1. Text Classification

```python
from specweave import NLPPipeline

# Binary or multi-class text classification
pipeline = NLPPipeline(
    task="classification",
    classes=["positive", "negative", "neutral"],
    increment="0042"
)

# Automatically configures:
# - Text preprocessing (lowercase, clean)
# - Tokenization (BERT tokenizer)
# - Model (BERT, RoBERTa, DistilBERT)
# - Fine-tuning on your data
# - Inference pipeline

pipeline.fit(train_texts, train_labels)
```

### 2. Named Entity Recognition (NER)

```python
# Extract entities from text
pipeline = NLPPipeline(
    task="ner",
    entities=["PERSON", "ORG", "LOC", "DATE"],
    increment="0042"
)

# Returns: [(entity_text, entity_type, start_pos, end_pos), ...]
```

### 3. Sentiment Analysis

```python
# Sentiment classification (specialized)
pipeline = NLPPipeline(
    task="sentiment",
    increment="0042"
)

# Fine-tuned for sentiment (positive/negative/neutral)
```

### 4. Text Generation

```python
# Generate text continuations
pipeline = NLPPipeline(
    task="generation",
    model="gpt2",
    increment="0042"
)

# Fine-tune on your domain-specific text
```

## Best Practices for NLP

### Text Preprocessing

```python
from specweave import TextPreprocessor

preprocessor = TextPreprocessor(increment="0042")

# Standard preprocessing
preprocessor.add_steps([
    "lowercase",
    "remove_html",
    "remove_urls",
    "remove_emails",
    "remove_special_chars",
    "remove_extra_whitespace"
])

# Advanced preprocessing
preprocessor.add_advanced([
    "spell_correction",
    "lemmatization",
    "stopword_removal"
])
```

### Model Selection

**Text Classification**:
- Small datasets (<10K): DistilBERT (6x faster than BERT)
- Medium datasets (10

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
3455 chars