Use when validating golden dataset quality. Runs schema checks, duplicate detection, and coverage analysis to ensure dataset integrity for AI evaluation.
View on GitHubFebruary 4, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/plugins/ork-evaluation/skills/golden-dataset-validation/SKILL.md -a claude-code --skill golden-dataset-validationInstallation paths:
.claude/skills/golden-dataset-validation/# Golden Dataset Validation
**Ensure data integrity, prevent duplicates, and maintain quality standards**
## Overview
This skill provides comprehensive validation patterns for the golden dataset, ensuring every entry meets quality standards before inclusion.
**When to use this skill:**
- Validating new documents before adding
- Running integrity checks on existing dataset
- Detecting duplicate or similar content
- Analyzing coverage gaps
- Pre-commit validation hooks
---
## Schema Validation
### Document Schema (v2.0)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["id", "title", "source_url", "content_type", "sections"],
"properties": {
"id": {
"type": "string",
"pattern": "^[a-z0-9-]+$",
"description": "Unique kebab-case identifier"
},
"title": {
"type": "string",
"minLength": 10,
"maxLength": 200
},
"source_url": {
"type": "string",
"format": "uri",
"description": "Canonical source URL (NOT placeholder)"
},
"content_type": {
"type": "string",
"enum": ["article", "tutorial", "research_paper", "documentation", "video_transcript", "code_repository"]
},
"bucket": {
"type": "string",
"enum": ["short", "long"]
},
"tags": {
"type": "array",
"items": {"type": "string"},
"minItems": 2,
"maxItems": 10
},
"sections": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["id", "title", "content"],
"properties": {
"id": {"type": "string", "pattern": "^[a-z0-9-/]+$"},
"title": {"type": "string"},
"content": {"type": "string", "minLength": 50},
"granularity": {"enum": ["coarse", "fine", "summary"]}
}
}
}
}
}
```
### Query Schema
```json
{
"type": "object",
"required": ["id", "query", "difficulty", "expected_chunks", "min_score"],
"prop