Production hybrid search combining PGVector HNSW with BM25 using Reciprocal Rank Fusion. Use when implementing hybrid search, semantic + keyword retrieval, vector search optimization, metadata filtering, or choosing between HNSW and IVFFlat indexes.
View on GitHubyonatangross/orchestkit
orchestkit-complete
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/./skills/pgvector-search/SKILL.md -a claude-code --skill pgvector-searchInstallation paths:
.claude/skills/pgvector-search/# PGVector Hybrid Search
**Production-grade semantic + keyword search using PostgreSQL**
## Overview
**Architecture:**
```
Query
|
[Generate embedding] --> Vector Search (PGVector) --> Top 30 results
|
[Generate ts_query] --> Keyword Search (BM25) --> Top 30 results
|
[Reciprocal Rank Fusion (RRF)] --> Merge & re-rank --> Top 10 final results
```
**When to use this skill:**
- Building semantic search (RAG, knowledge bases, recommendations)
- Implementing hybrid retrieval (vector + keyword)
- Optimizing PGVector performance
- Working with large document collections (1M+ chunks)
---
## Quick Reference
### Search Type Comparison
| Aspect | Semantic (Vector) | Keyword (BM25) |
|--------|-------------------|----------------|
| **Query** | Embedding similarity | Exact word matches |
| **Strengths** | Synonyms, concepts | Exact phrases, rare terms |
| **Weaknesses** | Exact matches, technical terms | No semantic understanding |
| **Index** | HNSW (pgvector) | GIN (tsvector) |
### Index Comparison
| Metric | IVFFlat | HNSW |
|--------|---------|------|
| **Query speed** | 50ms | 3ms (17x faster) |
| **Index time** | 2 min | 20 min |
| **Best for** | < 100k vectors | 100k+ vectors |
| **Recall@10** | 0.85-0.95 | 0.95-0.99 |
**Recommendation:** Use HNSW for production (scales to millions).
### RRF Formula
```python
rrf_score = 1/(k + vector_rank) + 1/(k + keyword_rank) # k=60 (standard)
```
---
## Database Schema
```sql
CREATE TABLE chunks (
id UUID PRIMARY KEY,
document_id UUID REFERENCES documents(id),
content TEXT NOT NULL,
embedding vector(1024), -- PGVector
content_tsvector tsvector GENERATED ALWAYS AS (
to_tsvector('english', content)
) STORED,
section_title TEXT,
content_type TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Indexes
CREATE INDEX idx_chunks_embedding ON chunks
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX idx_chunks_content_tsvecto