Intelligent single-file document ingestion into MXCP servers. Supports Excel (.xlsx, .xls, .csv) and Word (.docx) files. Analyzes content to determine if queries will be needed (DuckDB) or semantic search (RAG txt), including converting tabular text to narrative format. Use when: (1) Ingesting a document into an MXCP project, (2) Adding a new data file to an existing MXCP server, (3) Processing documents with mixed tables and text, (4) Preparing data for RAG or database systems.
View on GitHubraw-labs/raw-labs-claude-marketplace
mxcp-plugin
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/raw-labs/raw-labs-claude-marketplace/blob/main/skills/document-to-mxcp/SKILL.md -a claude-code --skill document-to-mxcpInstallation paths:
.claude/skills/document-to-mxcp/# Document to MXCP Ingestion ## Supported Formats - **Excel:** .xlsx, .xls, .csv - **Word:** .docx (for .doc files, convert to .docx first using LibreOffice or similar) ## Required Skills **Invoke these skills as needed during execution:** | Skill | When to Use | |-------|-------------| | **xlsx** | Reading/analyzing Excel files (.xlsx, .xls, .csv) | | **docx** | Reading/analyzing Word documents (.docx) | | **mxcp-expert** | Creating MXCP project, tools, dbt models, validation | **Always use the appropriate skill** - don't try to implement Excel/Word parsing or MXCP operations from scratch. ## Environment - **Package manager:** `uv` with virtual environment (never global installs) - **Required packages:** `mxcp pandas openpyxl python-docx duckdb` - **Database:** Always use default `data/db-default.duckdb` (auto-created by MXCP) ## Core Principles 1. **Analyze first, ingest second.** Fully understand the file before writing anything. 2. **Generate reproducible pipelines.** Output is working scripts, not just ingested data. Scripts must be re-runnable from scratch to produce identical results. 3. **Code-first extraction.** Always generate dbt models and scripts. Manual extraction is not reproducible. 4. **The project IS the state.** Discover existing state from `models/`, `tools/`, `rag_content/`. 5. **Value-driven tools.** Understand what information is valuable before creating tools. 6. **Test-first validation.** Compute expected results from ORIGINAL SOURCE FILE (not database) before implementing tools. 7. **Ask when uncertain.** If classification or linking is ambiguous, ask the user. ## Execution Pipeline ### Phase 0: Project Context #### New project: ```bash mkdir my-project && cd my-project uv venv && source .venv/bin/activate uv pip install mxcp pandas openpyxl python-docx duckdb mxcp init --bootstrap ``` Create `dbt_project.yml` with `model-paths: ["models"]`. #### Existing project: ```bash # Ensure venv is active source .venv/bin/activate # V