llm-judge

# LLM Judge Skill

Compare code implementations across 2+ repositories using structured evaluation.

## Overview

This skill implements a two-phase LLM-as-judge evaluation:

1. **Phase 1: Fact Gathering** - Parallel agents explore each repo and extract structured facts
2. **Phase 2: Judging** - Parallel judges score each dimension using consistent rubrics

## Reference Files

| File | Purpose |
|------|---------|
| [references/fact-schema.md](references/fact-schema.md) | JSON schema for Phase 1 facts |
| [references/scoring-rubrics.md](references/scoring-rubrics.md) | Detailed rubrics for each dimension |
| [references/repo-agent.md](references/repo-agent.md) | Instructions for Phase 1 agents |
| [references/judge-agents.md](references/judge-agents.md) | Instructions for Phase 2 judges |

## Scoring Dimensions

| Dimension | Default Weight | Evaluates |
|-----------|----------------|-----------|
| Functionality | 30% | Spec compliance, test pass rate |
| Security | 25% | Vulnerabilities, security patterns |
| Test Quality | 20% | Coverage, DRY, mock boundaries |
| Overengineering | 15% | Unnecessary complexity |
| Dead Code | 10% | Unused code, TODOs |

## Scoring Scale

| Score | Meaning |
|-------|---------|
| 5 | Excellent - Exceeds expectations |
| 4 | Good - Meets requirements, minor issues |
| 3 | Average - Functional but notable gaps |
| 2 | Below Average - Significant issues |
| 1 | Poor - Fails basic requirements |

## Phase 1: Spawning Repo Agents

For each repository, spawn a Task agent with:

```
You are a Phase 1 Repo Agent for the LLM Judge evaluation.

**Your Repo:** $REPO_LABEL at $REPO_PATH
**Spec Document:**
$SPEC_CONTENT

**Instructions:** Read @beagle:llm-judge references/repo-agent.md

Gather facts and return a JSON object following the schema in references/fact-schema.md.

Load @beagle:llm-artifacts-detection for dead code and overengineering analysis.

Return ONLY valid JSON, no markdown or explanations.
```

## Phase 2: Spawning Judge Agents
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details