deepeval-testing

# DeepEval Testing

Skill for pytest-style LLM evaluation with DeepEval.

## Overview

DeepEval provides:

- pytest-compatible LLM tests
- Built-in metrics (faithfulness, relevance, toxicity)
- Custom metric creation
- Async test execution

## Use When

This skill is automatically invoked when:

- Creating pytest-style LLM tests
- Evaluating RAG quality
- Measuring faithfulness/relevance
- Building custom metrics

## Available Scripts

| Script                     | Description                 |
| -------------------------- | --------------------------- |
| `scripts/init-deepeval.sh` | Initialize DeepEval project |
| `scripts/run-tests.sh`     | Run DeepEval tests          |

## Available Templates

| Template                  | Description          |
| ------------------------- | -------------------- |
| `templates/conftest.py`   | pytest configuration |
| `templates/test_basic.py` | Basic test structure |
| `templates/test_rag.py`   | RAG evaluation tests |

## Built-in Metrics

- `AnswerRelevancyMetric`
- `FaithfulnessMetric`
- `ContextualRelevancyMetric`
- `HallucinationMetric`
- `ToxicityMetric`
- `BiasMetric`

Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details