Local LLM inference with Ollama. Use when setting up local models for development, CI pipelines, or cost reduction. Covers model selection, LangChain integration, and performance tuning.
View on GitHubSelect agents to install to:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/plugins/ork-llm/skills/ollama-local/SKILL.md -a claude-code --skill ollama-localInstallation paths:
.claude/skills/ollama-local/# Ollama Local Inference
Run LLMs locally for cost savings, privacy, and offline development.
## Quick Start
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull models
ollama pull deepseek-r1:70b # Reasoning (GPT-4 level)
ollama pull qwen2.5-coder:32b # Coding
ollama pull nomic-embed-text # Embeddings
# Start server
ollama serve
```
## Recommended Models (M4 Max 256GB)
| Task | Model | Size | Notes |
|------|-------|------|-------|
| Reasoning | `deepseek-r1:70b` | ~42GB | GPT-4 level |
| Coding | `qwen2.5-coder:32b` | ~35GB | 73.7% Aider benchmark |
| Embeddings | `nomic-embed-text` | ~0.5GB | 768 dims, fast |
| General | `llama3.3:70b` | ~40GB | Good all-around |
## LangChain Integration
```python
from langchain_ollama import ChatOllama, OllamaEmbeddings
# Chat model
llm = ChatOllama(
model="deepseek-r1:70b",
base_url="http://localhost:11434",
temperature=0.0,
num_ctx=32768, # Context window
keep_alive="5m", # Keep model loaded
)
# Embeddings
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434",
)
# Generate
response = await llm.ainvoke("Explain async/await")
vector = await embeddings.aembed_query("search text")
```
## Tool Calling with Ollama
```python
from langchain_core.tools import tool
@tool
def search_docs(query: str) -> str:
"""Search the document database."""
return f"Found results for: {query}"
# Bind tools
llm_with_tools = llm.bind_tools([search_docs])
response = await llm_with_tools.ainvoke("Search for Python patterns")
```
## Structured Output
```python
from pydantic import BaseModel, Field
class CodeAnalysis(BaseModel):
language: str = Field(description="Programming language")
complexity: int = Field(ge=1, le=10)
issues: list[str] = Field(description="Found issues")
structured_llm = llm.with_structured_output(CodeAnalysis)
result = await structured_llm.ainvoke("Analyze this code: ...")
# result is typed