document-to-mxcp

# Document to MXCP Ingestion

## Supported Formats

- **Excel:** .xlsx, .xls, .csv
- **Word:** .docx (for .doc files, convert to .docx first using LibreOffice or similar)

## Required Skills

**Invoke these skills as needed during execution:**

| Skill | When to Use |
|-------|-------------|
| **xlsx** | Reading/analyzing Excel files (.xlsx, .xls, .csv) |
| **docx** | Reading/analyzing Word documents (.docx) |
| **mxcp-expert** | Creating MXCP project, tools, dbt models, validation |

**Always use the appropriate skill** - don't try to implement Excel/Word parsing or MXCP operations from scratch.

## Environment

- **Package manager:** `uv` with virtual environment (never global installs)
- **Required packages:** `mxcp pandas openpyxl python-docx duckdb`
- **Database:** Always use default `data/db-default.duckdb` (auto-created by MXCP)

## Core Principles

1. **Analyze first, ingest second.** Fully understand the file before writing anything.
2. **Generate reproducible pipelines.** Output is working scripts, not just ingested data. Scripts must be re-runnable from scratch to produce identical results.
3. **Code-first extraction.** Always generate dbt models and scripts. Manual extraction is not reproducible.
4. **The project IS the state.** Discover existing state from `models/`, `tools/`, `rag_content/`.
5. **Value-driven tools.** Understand what information is valuable before creating tools.
6. **Test-first validation.** Compute expected results from ORIGINAL SOURCE FILE (not database) before implementing tools.
7. **Ask when uncertain.** If classification or linking is ambiguous, ask the user.

## Execution Pipeline

### Phase 0: Project Context

#### New project:
```bash
mkdir my-project && cd my-project
uv venv && source .venv/bin/activate
uv pip install mxcp pandas openpyxl python-docx duckdb
mxcp init --bootstrap
```

Create `dbt_project.yml` with `model-paths: ["models"]`.

#### Existing project:
```bash
# Ensure venv is active
source .venv/bin/activate

# V
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details