Create and manage datasets on Hugging Face Hub. Supports initializing repos, defining configs/system prompts, streaming row updates, and SQL-based dataset querying/transformation. Designed to work alongside HF MCP server for comprehensive dataset workflows.
View on GitHubFebruary 2, 2026
Select agents to install to:
npx add-skill https://github.com/patchy631/ai-engineering-hub/blob/03e1404d7fa87896b6b3361e04939a4a9a984ba5/hugging-face-skills/skills/hugging-face-datasets/SKILL.md -a claude-code --skill hugging-face-datasetsInstallation paths:
.claude/skills/hugging-face-datasets/# Overview This skill provides tools to manage datasets on the Hugging Face Hub with a focus on creation, configuration, content management, and SQL-based data manipulation. It is designed to complement the existing Hugging Face MCP server by providing dataset editing and querying capabilities. ## Integration with HF MCP Server - **Use HF MCP Server for**: Dataset discovery, search, and metadata retrieval - **Use This Skill for**: Dataset creation, content editing, SQL queries, data transformation, and structured data formatting # Version 2.1.0 # Dependencies - huggingface_hub - duckdb (for SQL queries) - datasets (for pushing query results to Hub) - json (built-in) - time (built-in) # Core Capabilities ## 1. Dataset Lifecycle Management - **Initialize**: Create new dataset repositories with proper structure - **Configure**: Store detailed configuration including system prompts and metadata - **Stream Updates**: Add rows efficiently without downloading entire datasets ## 2. SQL-Based Dataset Querying (NEW) Query any Hugging Face dataset using DuckDB SQL via `scripts/sql_manager.py`: - **Direct Queries**: Run SQL on datasets using the `hf://` protocol - **Schema Discovery**: Describe dataset structure and column types - **Data Sampling**: Get random samples for exploration - **Aggregations**: Count, histogram, unique values analysis - **Transformations**: Filter, join, reshape data with SQL - **Export & Push**: Save results locally or push to new Hub repos ## 3. Multi-Format Dataset Support Supports diverse dataset types through template system: - **Chat/Conversational**: Chat templating, multi-turn dialogues, tool usage examples - **Text Classification**: Sentiment analysis, intent detection, topic classification - **Question-Answering**: Reading comprehension, factual QA, knowledge bases - **Text Completion**: Language modeling, code completion, creative writing - **Tabular Data**: Structured data for regression/classif