book-sft-pipeline

# Book SFT Pipeline

A complete system for converting books into SFT datasets and training style-transfer models. This skill teaches the pipeline from raw ePub to a model that writes in any author's voice.

## When to Activate

Activate this skill when:
- Building fine-tuning datasets from literary works
- Creating author-voice or style-transfer models
- Preparing training data for Tinker or similar SFT platforms
- Designing text segmentation pipelines for long-form content
- Training small models (8B or less) on limited data

## Core Concepts

### The Three Pillars of Book SFT

**1. Intelligent Segmentation**
Text chunks must be semantically coherent. Breaking mid-sentence teaches the model to produce fragmented output. Target: 150-400 words per chunk, always at natural boundaries.

**2. Diverse Instruction Generation**
Use multiple prompt templates and system prompts to prevent overfitting. A single prompt style leads to memorization. Use 15+ prompt templates with 5+ system prompts.

**3. Style Over Content**
The goal is learning the author's rhythm and vocabulary patterns, not memorizing plots. Synthetic instructions describe what happens without quoting the text.

## Pipeline Architecture

```
┌─────────────────────────────────────────────────────────────────┐
│                    ORCHESTRATOR AGENT                           │
│  Coordinates pipeline phases, manages state, handles failures   │
└──────────────────────┬──────────────────────────────────────────┘
                       │
       ┌───────────────┼───────────────┬───────────────┐
       ▼               ▼               ▼               ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  EXTRACTION  │ │ SEGMENTATION │ │  INSTRUCTION │ │   DATASET    │
│    AGENT     │ │    AGENT     │ │    AGENT     │ │   BUILDER    │
│ ePub → Text  │ │ Text → Chunks│ │ Chunks →     │ │ Pairs →      │
│              │ │ 150-400 words│ │ Prompts      │ │ JSONL        │
└──────────────┘ └──────────────
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details