Use when backing up, restoring, or validating golden datasets. Prevents data loss and ensures test data integrity for AI/ML evaluation systems.
View on GitHubyonatangross/orchestkit
ork
January 25, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/skills/golden-dataset-management/SKILL.md -a claude-code --skill golden-dataset-managementInstallation paths:
.claude/skills/golden-dataset-management/# Golden Dataset Management **Protect and maintain high-quality test datasets for AI/ML systems** ## Overview A **golden dataset** is a curated collection of high-quality examples used for: - **Regression testing:** Ensure new code doesn't break existing functionality - **Retrieval evaluation:** Measure search quality (precision, recall, MRR) - **Model benchmarking:** Compare different models/approaches - **Reproducibility:** Consistent results across environments **When to use this skill:** - Building test datasets for RAG systems - Implementing backup/restore for critical data - Validating data integrity (URL contracts, embeddings) - Migrating data between environments --- ## OrchestKit's Golden Dataset **Stats (Production):** - **98 analyses** (completed content analyses) - **415 chunks** (embedded text segments) - **203 test queries** (with expected results) - **91.6% pass rate** (retrieval quality metric) **Purpose:** - Test hybrid search (vector + BM25 + RRF) - Validate metadata boosting strategies - Detect regressions in retrieval quality - Benchmark new embedding models --- ## Core Concepts ### Data Integrity Contracts **The URL Contract:** Golden dataset analyses MUST store **real canonical URLs**, not placeholders. ```python # WRONG - Placeholder URL (breaks restore) analysis.url = "https://orchestkit.dev/placeholder/123" # CORRECT - Real canonical URL (enables re-fetch if needed) analysis.url = "https://docs.python.org/3/library/asyncio.html" ``` **Why this matters:** - Enables re-fetching content if embeddings need regeneration - Allows validation that source content hasn't changed - Provides audit trail for data provenance --- ## Backup Strategy Comparison | Strategy | Version Control | Restore Speed | Portability | Inspection | |----------|-----------------|---------------|-------------|------------| | **JSON** (recommended) | Yes | Slower (regen embeddings) | High | Easy | | **SQL Dump** | No (binary) | Fast | DB-version dependent |