Local LLM inference with Ollama. Use when setting up local models for development, CI pipelines, or cost reduction. Covers model selection, LangChain integration, and performance tuning.
View on GitHubyonatangross/orchestkit
ork
January 25, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/orchestkit/blob/main/skills/ollama-local/SKILL.md -a claude-code --skill ollama-localInstallation paths:
.claude/skills/ollama-local/# Ollama Local Inference
Run LLMs locally for cost savings, privacy, and offline development.
## Quick Start
```bash
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull models
ollama pull deepseek-r1:70b # Reasoning (GPT-4 level)
ollama pull qwen2.5-coder:32b # Coding
ollama pull nomic-embed-text # Embeddings
# Start server
ollama serve
```
## Recommended Models (M4 Max 256GB)
| Task | Model | Size | Notes |
|------|-------|------|-------|
| Reasoning | `deepseek-r1:70b` | ~42GB | GPT-4 level |
| Coding | `qwen2.5-coder:32b` | ~35GB | 73.7% Aider benchmark |
| Embeddings | `nomic-embed-text` | ~0.5GB | 768 dims, fast |
| General | `llama3.2:70b` | ~40GB | Good all-around |
## LangChain Integration
```python
from langchain_ollama import ChatOllama, OllamaEmbeddings
# Chat model
llm = ChatOllama(
model="deepseek-r1:70b",
base_url="http://localhost:11434",
temperature=0.0,
num_ctx=32768, # Context window
keep_alive="5m", # Keep model loaded
)
# Embeddings
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434",
)
# Generate
response = await llm.ainvoke("Explain async/await")
vector = await embeddings.aembed_query("search text")
```
## Tool Calling with Ollama
```python
from langchain_core.tools import tool
@tool
def search_docs(query: str) -> str:
"""Search the document database."""
return f"Found results for: {query}"
# Bind tools
llm_with_tools = llm.bind_tools([search_docs])
response = await llm_with_tools.ainvoke("Search for Python patterns")
```
## Structured Output
```python
from pydantic import BaseModel, Field
class CodeAnalysis(BaseModel):
language: str = Field(description="Programming language")
complexity: int = Field(ge=1, le=10)
issues: list[str] = Field(description="Found issues")
structured_llm = llm.with_structured_output(CodeAnalysis)
result = await structured_llm.ainvoke("Analyze this code: ...")
# result is typed