Back to Skills

golden-dataset-curation

verified

Use when creating or improving golden datasets for AI evaluation. Defines quality criteria, curation workflows, and multi-agent analysis patterns for test data.

View on GitHub

Marketplace

orchestkit

yonatangross/skillforge-claude-plugin

Plugin

ork

development

Repository

yonatangross/skillforge-claude-plugin
33stars

skills/golden-dataset-curation/SKILL.md

Last Verified

January 25, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/skills/golden-dataset-curation/SKILL.md -a claude-code --skill golden-dataset-curation

Installation paths:

Claude
.claude/skills/golden-dataset-curation/
Powered by add-skill CLI

Instructions

# Golden Dataset Curation

**Curate high-quality documents for the golden dataset with multi-agent validation**

## Overview

This skill provides patterns and workflows for **adding new documents** to the golden dataset with thorough quality analysis. It complements `golden-dataset-management` which handles backup/restore.

**When to use this skill:**
- Adding new documents to the golden dataset
- Classifying content types and difficulty levels
- Generating test queries for new documents
- Running multi-agent quality analysis

---

## Content Types

| Type | Description | Quality Focus |
|------|-------------|---------------|
| `article` | Technical articles, blog posts | Depth, accuracy, actionability |
| `tutorial` | Step-by-step guides | Completeness, clarity, code quality |
| `research_paper` | Academic papers, whitepapers | Rigor, citations, methodology |
| `documentation` | API docs, reference materials | Accuracy, completeness, examples |
| `video_transcript` | Transcribed video content | Structure, coherence, key points |
| `code_repository` | README, code analysis | Code quality, documentation |

---

## Difficulty Levels

| Level | Semantic Complexity | Expected Score | Characteristics |
|-------|---------------------|----------------|-----------------|
| **trivial** | Direct keyword match | >0.85 | Technical terms, exact phrases |
| **easy** | Common synonyms | >0.70 | Well-known concepts, slight variations |
| **medium** | Paraphrased intent | >0.55 | Conceptual queries, multi-topic |
| **hard** | Multi-hop reasoning | >0.40 | Cross-domain, comparative analysis |
| **adversarial** | Edge cases | Graceful degradation | Robustness tests, off-domain |

---

## Quality Dimensions

| Dimension | Weight | Perfect | Acceptable | Failing |
|-----------|--------|---------|------------|---------|
| **Accuracy** | 0.25 | 0.95-1.0 | 0.70-0.94 | <0.70 |
| **Coherence** | 0.20 | 0.90-1.0 | 0.60-0.89 | <0.60 |
| **Depth** | 0.25 | 0.90-1.0 | 0.55-0.89 | <0.55 |
| **Rel

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
6389 chars