Back to Skills

document-to-mxcp

verified

Intelligent single-file document ingestion into MXCP servers. Supports Excel (.xlsx, .xls, .csv) and Word (.docx) files. Analyzes content to determine if queries will be needed (DuckDB) or semantic search (RAG txt), including converting tabular text to narrative format. Use when: (1) Ingesting a document into an MXCP project, (2) Adding a new data file to an existing MXCP server, (3) Processing documents with mixed tables and text, (4) Preparing data for RAG or database systems.

View on GitHub

Marketplace

raw-labs-claude-marketplace

raw-labs/raw-labs-claude-marketplace

Plugin

mxcp-plugin

Repository

raw-labs/raw-labs-claude-marketplace
2stars

skills/document-to-mxcp/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/raw-labs/raw-labs-claude-marketplace/blob/main/skills/document-to-mxcp/SKILL.md -a claude-code --skill document-to-mxcp

Installation paths:

Claude
.claude/skills/document-to-mxcp/
Powered by add-skill CLI

Instructions

# Document to MXCP Ingestion

## Supported Formats

- **Excel:** .xlsx, .xls, .csv
- **Word:** .docx (for .doc files, convert to .docx first using LibreOffice or similar)

## Required Skills

**Invoke these skills as needed during execution:**

| Skill | When to Use |
|-------|-------------|
| **xlsx** | Reading/analyzing Excel files (.xlsx, .xls, .csv) |
| **docx** | Reading/analyzing Word documents (.docx) |
| **mxcp-expert** | Creating MXCP project, tools, dbt models, validation |

**Always use the appropriate skill** - don't try to implement Excel/Word parsing or MXCP operations from scratch.

## Environment

- **Package manager:** `uv` with virtual environment (never global installs)
- **Required packages:** `mxcp pandas openpyxl python-docx duckdb`
- **Database:** Always use default `data/db-default.duckdb` (auto-created by MXCP)

## Core Principles

1. **Analyze first, ingest second.** Fully understand the file before writing anything.
2. **Generate reproducible pipelines.** Output is working scripts, not just ingested data. Scripts must be re-runnable from scratch to produce identical results.
3. **Code-first extraction.** Always generate dbt models and scripts. Manual extraction is not reproducible.
4. **The project IS the state.** Discover existing state from `models/`, `tools/`, `rag_content/`.
5. **Value-driven tools.** Understand what information is valuable before creating tools.
6. **Test-first validation.** Compute expected results from ORIGINAL SOURCE FILE (not database) before implementing tools.
7. **Ask when uncertain.** If classification or linking is ambiguous, ask the user.

## Execution Pipeline

### Phase 0: Project Context

#### New project:
```bash
mkdir my-project && cd my-project
uv venv && source .venv/bin/activate
uv pip install mxcp pandas openpyxl python-docx duckdb
mxcp init --bootstrap
```

Create `dbt_project.yml` with `model-paths: ["models"]`.

#### Existing project:
```bash
# Ensure venv is active
source .venv/bin/activate

# V

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
17749 chars