golden-dataset-management

# Golden Dataset Management

**Protect and maintain high-quality test datasets for AI/ML systems**

## Overview

A **golden dataset** is a curated collection of high-quality examples used for:
- **Regression testing:** Ensure new code doesn't break existing functionality
- **Retrieval evaluation:** Measure search quality (precision, recall, MRR)
- **Model benchmarking:** Compare different models/approaches
- **Reproducibility:** Consistent results across environments

**When to use this skill:**
- Building test datasets for RAG systems
- Implementing backup/restore for critical data
- Validating data integrity (URL contracts, embeddings)
- Migrating data between environments

---

## OrchestKit's Golden Dataset

**Stats (Production):**
- **98 analyses** (completed content analyses)
- **415 chunks** (embedded text segments)
- **203 test queries** (with expected results)
- **91.6% pass rate** (retrieval quality metric)

**Purpose:**
- Test hybrid search (vector + BM25 + RRF)
- Validate metadata boosting strategies
- Detect regressions in retrieval quality
- Benchmark new embedding models

---

## Core Concepts

### Data Integrity Contracts

**The URL Contract:**
Golden dataset analyses MUST store **real canonical URLs**, not placeholders.

```python
# WRONG - Placeholder URL (breaks restore)
analysis.url = "https://orchestkit.dev/placeholder/123"

# CORRECT - Real canonical URL (enables re-fetch if needed)
analysis.url = "https://docs.python.org/3/library/asyncio.html"
```

**Why this matters:**
- Enables re-fetching content if embeddings need regeneration
- Allows validation that source content hasn't changed
- Provides audit trail for data provenance

---

## Backup Strategy Comparison

| Strategy | Version Control | Restore Speed | Portability | Inspection |
|----------|-----------------|---------------|-------------|------------|
| **JSON** (recommended) | Yes | Slower (regen embeddings) | High | Easy |
| **SQL Dump** | No (binary) | Fast | DB-version dependent |
golden-dataset-management

Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details