Vector-based code discovery using LanceDB and Ollama embeddings
View on GitHubialameh/sift-coder
siftcoder
skills/semantic-codebase-search/SKILL.md
January 24, 2026
Select agents to install to:
npx add-skill https://github.com/ialameh/sift-coder/blob/main/skills/semantic-codebase-search/SKILL.md -a claude-code --skill semantic-codebase-searchInstallation paths:
.claude/skills/semantic-codebase-search/# Semantic Codebase Search Skill
**Vector-based code discovery using LanceDB and Ollama embeddings.**
## Purpose
This skill provides:
- Vector-based semantic code search
- Natural language query understanding
- Context-aware result presentation
- Index management and updates
## Core Functions
### 1. Index Codebase
```bash
index_codebase() {
local path="${1:-.}"
echo "๐๏ธ Indexing codebase at: $path"
# Find all code files
files=$(find "$path" -type f \
\( -name "*.ts" -o -name "*.tsx" -o -name "*.js" -o -name "*.jsx" \
-o -name "*.py" -o -name "*.go" -o -name "*.rs" -o -name "*.java" \) \
| grep -v node_modules | grep -v ".next" | grep -v "dist/")
total=$(echo "$files" | wc -l)
echo "๐ Found $total files to index"
# Create index directory
mkdir -p .claude/siftcoder-state/vector-index
# Process files in batches
batch_size=50
batch=()
echo "$files" | while read file; do
batch+=("$file")
if [ ${#batch[@]} -eq $batch_size ]; then
index_batch "${batch[@]}"
batch=()
fi
done
# Process remaining files
if [ ${#batch[@]} -gt 0 ]; then
index_batch "${batch[@]}"
fi
# Save metadata
cat > .claude/siftcoder-state/vector-index/metadata.json <<EOF
{
"created_at": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")",
"files_indexed": $total,
"path": "$path",
"embedding_model": "nomic-embed-text"
}
EOF
echo "โ
Index complete"
}
```
### 2. Search Vector Index
```bash
search_vectors() {
local query="$1"
local limit="${2:-10}"
# Generate query embedding
query_emb=$(ollama embed nomic-embed-text "$query" | jq '.embedding')
# Search LanceDB
results=$(python3 <<EOF
import lancedb
import json
db = lancedb.connect(".claude/siftcoder-state/vector-index")
table = db.open("codebase")
results = table.search($query_emb).limit($limit).to_df()
for _, row in results.iterrows():
print(f"{row['file']}:{row['line']}")
print(f" Score: {row['_score']:.2f}")
print(f" Code: {row['code'][:1