Back to Skills

embedding-strategies

verified

Select and optimize embedding models for semantic search and RAG applications. Use when choosing embedding models, implementing chunking strategies, or optimizing embedding quality for specific domains.

View on GitHub

Marketplace

claude-code-ccf-marketplace

ccf/claude-code-ccf-marketplace

Plugin

llm-application-dev

ai-ml

Repository

ccf/claude-code-ccf-marketplace

plugins/llm-application-dev/skills/embedding-strategies/SKILL.md

Last Verified

January 20, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/ccf/claude-code-ccf-marketplace/blob/main/plugins/llm-application-dev/skills/embedding-strategies/SKILL.md -a claude-code --skill embedding-strategies

Installation paths:

Claude
.claude/skills/embedding-strategies/
Powered by add-skill CLI

Instructions

# Embedding Strategies

Guide to selecting and optimizing embedding models for vector search applications.

## When to Use This Skill

- Choosing embedding models for RAG
- Optimizing chunking strategies
- Fine-tuning embeddings for domains
- Comparing embedding model performance
- Reducing embedding dimensions
- Handling multilingual content

## Core Concepts

### 1. Embedding Model Comparison

| Model                      | Dimensions | Max Tokens | Best For          |
| -------------------------- | ---------- | ---------- | ----------------- |
| **text-embedding-3-large** | 3072       | 8191       | High accuracy     |
| **text-embedding-3-small** | 1536       | 8191       | Cost-effective    |
| **voyage-2**               | 1024       | 4000       | Code, legal       |
| **bge-large-en-v1.5**      | 1024       | 512        | Open source       |
| **all-MiniLM-L6-v2**       | 384        | 256        | Fast, lightweight |
| **multilingual-e5-large**  | 1024       | 512        | Multi-language    |

### 2. Embedding Pipeline

```
Document → Chunking → Preprocessing → Embedding Model → Vector
                ↓
        [Overlap, Size]  [Clean, Normalize]  [API/Local]
```

## Templates

### Template 1: OpenAI Embeddings

```python
from openai import OpenAI
from typing import List
import numpy as np

client = OpenAI()

def get_embeddings(
    texts: List[str],
    model: str = "text-embedding-3-small",
    dimensions: int = None
) -> List[List[float]]:
    """Get embeddings from OpenAI."""
    # Handle batching for large lists
    batch_size = 100
    all_embeddings = []

    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]

        kwargs = {"input": batch, "model": model}
        if dimensions:
            kwargs["dimensions"] = dimensions

        response = client.embeddings.create(**kwargs)
        embeddings = [item.embedding for item in response.data]
        all_embeddings.extend(embeddings)

    return all_embeddings


def get

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
11820 chars