End-to-end system for creating supervised fine-tuning datasets from books and training style-transfer models. Covers text extraction, intelligent segmentation, synthetic instruction generation, Tinker-compatible output, LoRA training, and validation.
View on GitHubEricGrill/agents-skills-plugins
book-training
January 20, 2026
Select agents to install to:
npx add-skill https://github.com/EricGrill/agents-skills-plugins/blob/main/plugins/book-training/skills/book-sft-pipeline/SKILL.md -a claude-code --skill book-sft-pipelineInstallation paths:
.claude/skills/book-sft-pipeline/# Book SFT Pipeline
A complete system for converting books into SFT datasets and training style-transfer models. This skill teaches the pipeline from raw ePub to a model that writes in any author's voice.
## When to Activate
Activate this skill when:
- Building fine-tuning datasets from literary works
- Creating author-voice or style-transfer models
- Preparing training data for Tinker or similar SFT platforms
- Designing text segmentation pipelines for long-form content
- Training small models (8B or less) on limited data
## Core Concepts
### The Three Pillars of Book SFT
**1. Intelligent Segmentation**
Text chunks must be semantically coherent. Breaking mid-sentence teaches the model to produce fragmented output. Target: 150-400 words per chunk, always at natural boundaries.
**2. Diverse Instruction Generation**
Use multiple prompt templates and system prompts to prevent overfitting. A single prompt style leads to memorization. Use 15+ prompt templates with 5+ system prompts.
**3. Style Over Content**
The goal is learning the author's rhythm and vocabulary patterns, not memorizing plots. Synthetic instructions describe what happens without quoting the text.
## Pipeline Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR AGENT │
│ Coordinates pipeline phases, manages state, handles failures │
└──────────────────────┬──────────────────────────────────────────┘
│
┌───────────────┼───────────────┬───────────────┐
▼ ▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ EXTRACTION │ │ SEGMENTATION │ │ INSTRUCTION │ │ DATASET │
│ AGENT │ │ AGENT │ │ AGENT │ │ BUILDER │
│ ePub → Text │ │ Text → Chunks│ │ Chunks → │ │ Pairs → │
│ │ │ 150-400 words│ │ Prompts │ │ JSONL │
└──────────────┘ └──────────────