Optimizing vector embeddings for RAG systems through model selection, chunking strategies, caching, and performance tuning. Use when building semantic search, RAG pipelines, or document retrieval systems that require cost-effective, high-quality embeddings.
View on GitHubancoleman/ai-design-components
backend-ai-skills
February 1, 2026
Select agents to install to:
npx add-skill https://github.com/ancoleman/ai-design-components/blob/main/skills/embedding-optimization/SKILL.md -a claude-code --skill embedding-optimizationInstallation paths:
.claude/skills/embedding-optimization/# Embedding Optimization Optimize embedding generation for cost, performance, and quality in RAG and semantic search systems. ## When to Use This Skill Trigger this skill when: - Building RAG (Retrieval Augmented Generation) systems - Implementing semantic search or similarity detection - Optimizing embedding API costs (reducing by 70-90%) - Improving document retrieval quality through better chunking - Processing large document corpora (thousands to millions of documents) - Selecting between API-based vs. local embedding models ## Model Selection Framework Choose the optimal embedding model based on requirements: **Quick Recommendations:** - **Startup/MVP:** `all-MiniLM-L6-v2` (local, 384 dims, zero API costs) - **Production:** `text-embedding-3-small` (API, 1,536 dims, balanced quality/cost) - **High Quality:** `text-embedding-3-large` (API, 3,072 dims, premium) - **Multilingual:** `multilingual-e5-base` (local, 768 dims) or Cohere `embed-multilingual-v3.0` For detailed decision frameworks including cost comparisons, quality benchmarks, and data privacy considerations, see `references/model-selection-guide.md`. **Model Comparison Summary:** | Model | Type | Dimensions | Cost per 1M tokens | Best For | |-------|------|-----------|-------------------|----------| | all-MiniLM-L6-v2 | Local | 384 | $0 (compute only) | High volume, tight budgets | | BGE-base-en-v1.5 | Local | 768 | $0 (compute only) | Quality + cost balance | | text-embedding-3-small | API | 1,536 | $0.02 | General purpose production | | text-embedding-3-large | API | 3,072 | $0.13 | Premium quality requirements | | embed-multilingual-v3.0 | API | 1,024 | $0.10 | 100+ language support | ## Chunking Strategies Select chunking strategy based on content type and use case: **Content Type → Strategy Mapping:** - **Documentation:** Recursive (heading-aware), 800 chars, 100 overlap - **Code:** Recursive (function-level), 1,000 chars, 100 overlap - **Q&A/FAQ:** Fixed-size, 500 chars, 50 overlap (pr