pgvector-search

# PGVector Hybrid Search

**Production-grade semantic + keyword search using PostgreSQL**

## Overview

**Architecture:**
```
Query
  |
[Generate embedding] --> Vector Search (PGVector) --> Top 30 results
  |
[Generate ts_query]  --> Keyword Search (BM25)    --> Top 30 results
  |
[Reciprocal Rank Fusion (RRF)] --> Merge & re-rank --> Top 10 final results
```

**When to use this skill:**
- Building semantic search (RAG, knowledge bases, recommendations)
- Implementing hybrid retrieval (vector + keyword)
- Optimizing PGVector performance
- Working with large document collections (1M+ chunks)

---

## Quick Reference

### Search Type Comparison

| Aspect | Semantic (Vector) | Keyword (BM25) |
|--------|-------------------|----------------|
| **Query** | Embedding similarity | Exact word matches |
| **Strengths** | Synonyms, concepts | Exact phrases, rare terms |
| **Weaknesses** | Exact matches, technical terms | No semantic understanding |
| **Index** | HNSW (pgvector) | GIN (tsvector) |

### Index Comparison

| Metric | IVFFlat | HNSW |
|--------|---------|------|
| **Query speed** | 50ms | 3ms (17x faster) |
| **Index time** | 2 min | 20 min |
| **Best for** | < 100k vectors | 100k+ vectors |
| **Recall@10** | 0.85-0.95 | 0.95-0.99 |

**Recommendation:** Use HNSW for production (scales to millions).

### RRF Formula

```python
rrf_score = 1/(k + vector_rank) + 1/(k + keyword_rank)  # k=60 (standard)
```

---

## Database Schema

```sql
CREATE TABLE chunks (
    id UUID PRIMARY KEY,
    document_id UUID REFERENCES documents(id),
    content TEXT NOT NULL,
    embedding vector(1024),  -- PGVector
    content_tsvector tsvector GENERATED ALWAYS AS (
        to_tsvector('english', content)
    ) STORED,
    section_title TEXT,
    content_type TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Indexes
CREATE INDEX idx_chunks_embedding ON chunks
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

CREATE INDEX idx_chunks_content_tsvecto
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details