Document chunking implementations and benchmarking tools for RAG pipelines including fixed-size, semantic, recursive, and sentence-based strategies. Use when implementing document processing, optimizing chunk sizes, comparing chunking approaches, benchmarking retrieval performance, or when user mentions chunking, text splitting, document segmentation, RAG optimization, or chunk evaluation.
View on GitHubFebruary 1, 2026
Select agents to install to:
npx add-skill https://github.com/vanman2024/ai-dev-marketplace/blob/main/plugins/rag-pipeline/skills/chunking-strategies/SKILL.md -a claude-code --skill chunking-strategiesInstallation paths:
.claude/skills/chunking-strategies/# Chunking Strategies **Purpose:** Provide production-ready document chunking implementations, benchmarking tools, and strategy selection guidance for RAG pipelines. **Activation Triggers:** - Implementing document chunking for RAG - Optimizing chunk size and overlap - Comparing different chunking strategies - Benchmarking chunking performance - Processing different document types (markdown, code, PDFs) - Evaluating retrieval quality with different chunk strategies **Key Resources:** - `scripts/chunk-fixed-size.py` - Fixed-size chunking implementation - `scripts/chunk-semantic.py` - Semantic chunking with paragraph preservation - `scripts/chunk-recursive.py` - Recursive chunking for hierarchical documents - `scripts/benchmark-chunking.py` - Benchmark and compare chunking strategies - `templates/chunking-config.yaml` - Chunking configuration template - `templates/custom-splitter.py` - Template for custom chunking logic - `examples/chunk-markdown.py` - Markdown-specific chunking - `examples/chunk-code.py` - Source code chunking - `examples/chunk-pdf.py` - PDF document chunking ## Chunking Strategy Overview ### Strategy Selection Guide **Fixed-Size Chunking:** - Best for: Uniform documents, simple content, consistent structure - Pros: Fast, predictable, simple implementation - Cons: May split semantic units, no context awareness - Use when: Speed matters more than semantic coherence **Semantic Chunking:** - Best for: Natural language documents, articles, books - Pros: Preserves semantic boundaries, better context - Cons: Slower, variable chunk sizes - Use when: Content has clear paragraph/section structure **Recursive Chunking:** - Best for: Hierarchical documents, technical docs, code - Pros: Preserves structure, handles nested content - Cons: Most complex, requires structure detection - Use when: Documents have clear hierarchical organization **Sentence-Based Chunking:** - Best for: Q&A pairs, chatbots, precise retrieval - Pros: Natural boundaries, good for