Docling document parser for PDF, DOCX, PPTX, HTML, images, and 15+ formats. Use when parsing documents, extracting text, converting to Markdown/HTML/JSON, chunking for RAG pipelines, or batch processing files. Triggers on DocumentConverter, convert, convert_all, export_to_markdown, HierarchicalChunker, HybridChunker, ConversionResult.
View on GitHubskills/docling/SKILL.md
February 1, 2026
Select agents to install to:
npx add-skill https://github.com/existential-birds/beagle/blob/main/skills/docling/SKILL.md -a claude-code --skill doclingInstallation paths:
.claude/skills/docling/# Docling Document Parser
Docling is a document parsing library that converts PDFs, Word documents, PowerPoint, images, and other formats into structured data with advanced layout understanding.
## Quick Start
Basic document conversion:
```python
from docling.document_converter import DocumentConverter
source = "https://arxiv.org/pdf/2408.09869" # URL, Path, or BytesIO
converter = DocumentConverter()
result = converter.convert(source)
print(result.document.export_to_markdown())
```
## Core Concepts
### DocumentConverter
The main entry point for document conversion. Supports various input formats and conversion options.
```python
from docling.document_converter import DocumentConverter
from docling.datamodel.base_models import InputFormat
from docling.document_converter import PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions
# Basic converter (all formats enabled)
converter = DocumentConverter()
# Restricted formats
converter = DocumentConverter(
allowed_formats=[InputFormat.PDF, InputFormat.DOCX]
)
# Custom pipeline options
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
}
)
```
### ConversionResult
All conversion operations return a `ConversionResult` containing:
- `document`: The parsed `DoclingDocument`
- `status`: `ConversionStatus.SUCCESS`, `PARTIAL_SUCCESS`, or `FAILURE`
- `errors`: List of errors encountered during conversion
- `input`: Information about the source document
```python
result = converter.convert("document.pdf")
if result.status == ConversionStatus.SUCCESS:
markdown = result.document.export_to_markdown()
html = result.document.export_to_html()
data = result.document.export_to_dict()
```
## Supported Formats
### Input Formats
- **Documents**: PDF, DOCX, PPTX, XLSX
- **Markup**: HTML