Use when validating golden dataset quality. Runs schema checks, duplicate detection, and coverage analysis to ensure dataset integrity for AI evaluation.
View on GitHubyonatangross/skillforge-claude-plugin
orchestkit-complete
January 23, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/./skills/golden-dataset-validation/SKILL.md -a claude-code --skill golden-dataset-validationInstallation paths:
.claude/skills/golden-dataset-validation/# Golden Dataset Validation
**Ensure data integrity, prevent duplicates, and maintain quality standards**
## Overview
This skill provides comprehensive validation patterns for the golden dataset, ensuring every entry meets quality standards before inclusion.
**When to use this skill:**
- Validating new documents before adding
- Running integrity checks on existing dataset
- Detecting duplicate or similar content
- Analyzing coverage gaps
- Pre-commit validation hooks
---
## Schema Validation
### Document Schema (v2.0)
```json
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"required": ["id", "title", "source_url", "content_type", "sections"],
"properties": {
"id": {
"type": "string",
"pattern": "^[a-z0-9-]+$",
"description": "Unique kebab-case identifier"
},
"title": {
"type": "string",
"minLength": 10,
"maxLength": 200
},
"source_url": {
"type": "string",
"format": "uri",
"description": "Canonical source URL (NOT placeholder)"
},
"content_type": {
"type": "string",
"enum": ["article", "tutorial", "research_paper", "documentation", "video_transcript", "code_repository"]
},
"bucket": {
"type": "string",
"enum": ["short", "long"]
},
"tags": {
"type": "array",
"items": {"type": "string"},
"minItems": 2,
"maxItems": 10
},
"sections": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["id", "title", "content"],
"properties": {
"id": {"type": "string", "pattern": "^[a-z0-9-/]+$"},
"title": {"type": "string"},
"content": {"type": "string", "minLength": 50},
"granularity": {"enum": ["coarse", "fine", "summary"]}
}
}
}
}
}
```
### Query Schema
```json
{
"type": "object",
"required": ["id", "query", "difficulty", "expected_chunks", "min_score"],
"prop