evaluation-frameworks

# Evaluation Frameworks Skill

Frameworks for evaluating software and AI systems.

## LLM Evaluation

### Response Quality

```markdown
## LLM Response Evaluation

### Accuracy
Does the response contain correct information?

Rubric (1-5):
5 - Completely accurate
4 - Mostly accurate, minor errors
3 - Partially accurate
2 - Significant errors
1 - Incorrect

### Relevance
Does the response address the question?

Rubric (1-5):
5 - Directly addresses all aspects
4 - Addresses main points
3 - Partially relevant
2 - Mostly off-topic
1 - Completely irrelevant

### Helpfulness
Does the response help the user?

Rubric (1-5):
5 - Extremely helpful, actionable
4 - Helpful with good guidance
3 - Somewhat helpful
2 - Minimally helpful
1 - Not helpful
```

### LLM-as-Judge

```typescript
// LLM-based evaluation
interface JudgePrompt {
  criteria: string
  rubric: string
  task: string
  response: string
}

const judgePrompt = `
You are evaluating an AI response. Score it 1-5 based on the criteria.

## Criteria
${criteria}

## Rubric
${rubric}

## Task
${task}

## Response to Evaluate
${response}

## Instructions
1. Consider each aspect of the rubric
2. Identify strengths and weaknesses
3. Provide a score from 1-5
4. Explain your reasoning

Output format:
Score: [1-5]
Reasoning: [explanation]
`

async function evaluateWithJudge(
  response: string,
  task: string,
  criteria: string
): Promise<EvaluationResult> {
  const judgeResponse = await llm.complete(
    judgePrompt.replace('${response}', response)
                .replace('${task}', task)
                .replace('${criteria}', criteria)
  )

  return parseJudgeResponse(judgeResponse)
}
```

## Code Quality Evaluation

### Code Review Rubric

```markdown
## Code Review Evaluation

### Correctness
- Logic is sound
- Handles edge cases
- No obvious bugs

### Design
- Follows SOLID principles
- Appropriate abstractions
- Clean architecture

### Security
- No vulnerabilities
- Input validation
- Proper authentication

### Perfor
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details