Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
View on GitHubFebruary 2, 2026
Select agents to install to:
npx add-skill https://github.com/patchy631/ai-engineering-hub/blob/03e1404d7fa87896b6b3361e04939a4a9a984ba5/hugging-face-skills/skills/hugging-face-evaluation/SKILL.md -a claude-code --skill hugging-face-evaluationInstallation paths:
.claude/skills/hugging-face-evaluation/# Overview This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data: - Extracting existing evaluation tables from README content - Importing benchmark scores from Artificial Analysis - Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai) ## Integration with HF Ecosystem - **Model Cards**: Updates model-index metadata for leaderboard integration - **Artificial Analysis**: Direct API integration for benchmark imports - **Papers with Code**: Compatible with their model-index specification - **Jobs**: Run evaluations directly on Hugging Face Jobs with `uv` integration - **vLLM**: Efficient GPU inference for custom model evaluation - **lighteval**: HuggingFace's evaluation library with vLLM/accelerate backends - **inspect-ai**: UK AI Safety Institute's evaluation framework # Version 1.3.0 # Dependencies ## Core Dependencies - huggingface_hub>=0.26.0 - markdown-it-py>=3.0.0 - python-dotenv>=1.2.1 - pyyaml>=6.0.3 - requests>=2.32.5 - re (built-in) ## Inference Provider Evaluation - inspect-ai>=0.3.0 - inspect-evals - openai ## vLLM Custom Model Evaluation (GPU required) - lighteval[accelerate,vllm]>=0.6.0 - vllm>=0.4.0 - torch>=2.0.0 - transformers>=4.40.0 - accelerate>=0.30.0 Note: vLLM dependencies are installed automatically via PEP 723 script headers when using `uv run`. # IMPORTANT: Using This Skill ## ⚠️ CRITICAL: Check for Existing PRs Before Creating New Ones **Before creating ANY pull request with `--create-pr`, you MUST check for existing open PRs:** ```bash uv run scripts/evaluation_manager.py get-prs --repo-id "username/model-name" ``` **If open PRs exist:** 1. **DO NOT create a new PR** - this creates duplicate work for maintainers 2. **Warn the user** that open PRs already exist 3. **Show the user** the existing PR URLs so they can review them 4. Only proceed if the u