Back to Skills

ollama-local

verified

Local LLM inference with Ollama. Use when setting up local models for development, CI pipelines, or cost reduction. Covers model selection, LangChain integration, and performance tuning.

View on GitHub

Marketplace

orchestkit

yonatangross/orchestkit

Plugin

ork-llm-advanced

ai

Repository

yonatangross/orchestkit
33stars

plugins/ork-llm-advanced/skills/ollama-local/SKILL.md

Last Verified

January 25, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/plugins/ork-llm-advanced/skills/ollama-local/SKILL.md -a claude-code --skill ollama-local

Installation paths:

Claude
.claude/skills/ollama-local/
Powered by add-skill CLI

Instructions

# Ollama Local Inference

Run LLMs locally for cost savings, privacy, and offline development.

## Quick Start

```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull models
ollama pull deepseek-r1:70b      # Reasoning (GPT-4 level)
ollama pull qwen2.5-coder:32b    # Coding
ollama pull nomic-embed-text     # Embeddings

# Start server
ollama serve
```

## Recommended Models (M4 Max 256GB)

| Task | Model | Size | Notes |
|------|-------|------|-------|
| Reasoning | `deepseek-r1:70b` | ~42GB | GPT-4 level |
| Coding | `qwen2.5-coder:32b` | ~35GB | 73.7% Aider benchmark |
| Embeddings | `nomic-embed-text` | ~0.5GB | 768 dims, fast |
| General | `llama3.2:70b` | ~40GB | Good all-around |

## LangChain Integration

```python
from langchain_ollama import ChatOllama, OllamaEmbeddings

# Chat model
llm = ChatOllama(
    model="deepseek-r1:70b",
    base_url="http://localhost:11434",
    temperature=0.0,
    num_ctx=32768,      # Context window
    keep_alive="5m",    # Keep model loaded
)

# Embeddings
embeddings = OllamaEmbeddings(
    model="nomic-embed-text",
    base_url="http://localhost:11434",
)

# Generate
response = await llm.ainvoke("Explain async/await")
vector = await embeddings.aembed_query("search text")
```

## Tool Calling with Ollama

```python
from langchain_core.tools import tool

@tool
def search_docs(query: str) -> str:
    """Search the document database."""
    return f"Found results for: {query}"

# Bind tools
llm_with_tools = llm.bind_tools([search_docs])
response = await llm_with_tools.ainvoke("Search for Python patterns")
```

## Structured Output

```python
from pydantic import BaseModel, Field

class CodeAnalysis(BaseModel):
    language: str = Field(description="Programming language")
    complexity: int = Field(ge=1, le=10)
    issues: list[str] = Field(description="Found issues")

structured_llm = llm.with_structured_output(CodeAnalysis)
result = await structured_llm.ainvoke("Analyze this code: ...")
# result is typed

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
4567 chars