Back to Skills

golden-dataset-management

verified

Use when backing up, restoring, or validating golden datasets. Prevents data loss and ensures test data integrity for AI/ML evaluation systems.

View on GitHub

Marketplace

orchestkit

yonatangross/skillforge-claude-plugin

Plugin

ork

development

Repository

yonatangross/skillforge-claude-plugin
33stars

plugins/ork/skills/golden-dataset-management/SKILL.md

Last Verified

January 25, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/plugins/ork/skills/golden-dataset-management/SKILL.md -a claude-code --skill golden-dataset-management

Installation paths:

Claude
.claude/skills/golden-dataset-management/
Powered by add-skill CLI

Instructions

# Golden Dataset Management

**Protect and maintain high-quality test datasets for AI/ML systems**

## Overview

A **golden dataset** is a curated collection of high-quality examples used for:
- **Regression testing:** Ensure new code doesn't break existing functionality
- **Retrieval evaluation:** Measure search quality (precision, recall, MRR)
- **Model benchmarking:** Compare different models/approaches
- **Reproducibility:** Consistent results across environments

**When to use this skill:**
- Building test datasets for RAG systems
- Implementing backup/restore for critical data
- Validating data integrity (URL contracts, embeddings)
- Migrating data between environments

---

## OrchestKit's Golden Dataset

**Stats (Production):**
- **98 analyses** (completed content analyses)
- **415 chunks** (embedded text segments)
- **203 test queries** (with expected results)
- **91.6% pass rate** (retrieval quality metric)

**Purpose:**
- Test hybrid search (vector + BM25 + RRF)
- Validate metadata boosting strategies
- Detect regressions in retrieval quality
- Benchmark new embedding models

---

## Core Concepts

### Data Integrity Contracts

**The URL Contract:**
Golden dataset analyses MUST store **real canonical URLs**, not placeholders.

```python
# WRONG - Placeholder URL (breaks restore)
analysis.url = "https://orchestkit.dev/placeholder/123"

# CORRECT - Real canonical URL (enables re-fetch if needed)
analysis.url = "https://docs.python.org/3/library/asyncio.html"
```

**Why this matters:**
- Enables re-fetching content if embeddings need regeneration
- Allows validation that source content hasn't changed
- Provides audit trail for data provenance

---

## Backup Strategy Comparison

| Strategy | Version Control | Restore Speed | Portability | Inspection |
|----------|-----------------|---------------|-------------|------------|
| **JSON** (recommended) | Yes | Slower (regen embeddings) | High | Easy |
| **SQL Dump** | No (binary) | Fast | DB-version dependent | 

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
6605 chars