Official ollama Python library for LLM inference. Provides a clean, Pythonic interface for text generation, chat completion, embeddings, model management, and streaming responses.
View on GitHubatrawog/bazzite-ai-plugins
bazzite-ai-ollama
bazzite-ai-ollama/skills/python/SKILL.md
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/atrawog/bazzite-ai-plugins/blob/main/bazzite-ai-ollama/skills/python/SKILL.md -a claude-code --skill pythonInstallation paths:
.claude/skills/python/# Ollama Python Library
## Overview
The official `ollama` Python library provides a clean, Pythonic interface to all Ollama functionality. It automatically connects to the Ollama server and handles serialization.
## Quick Reference
| Function | Purpose |
|----------|---------|
| `ollama.list()` | List available models |
| `ollama.show()` | Show model details |
| `ollama.ps()` | List running models |
| `ollama.generate()` | Generate text |
| `ollama.chat()` | Chat completion |
| `ollama.embed()` | Generate embeddings |
| `ollama.copy()` | Copy a model |
| `ollama.delete()` | Delete a model |
| `ollama.pull()` | Pull a model |
## Setup
```python
import ollama
# The library automatically uses OLLAMA_HOST environment variable
# Default: http://localhost:11434
```
## List Models
```python
models = ollama.list()
for model in models.get("models", []):
size_gb = model.get("size", 0) / (1024**3)
print(f" - {model['model']} ({size_gb:.2f} GB)")
```
## Show Model Details
```python
model_info = ollama.show("llama3.2:latest")
details = model_info.get("details", {})
print(f"Family: {details.get('family', 'N/A')}")
print(f"Parameter Size: {details.get('parameter_size', 'N/A')}")
print(f"Quantization: {details.get('quantization_level', 'N/A')}")
```
## List Running Models
```python
running = ollama.ps()
for model in running.get("models", []):
name = model.get("name", "Unknown")
size = model.get("size", 0) / (1024**3)
vram = model.get("size_vram", 0) / (1024**3)
print(f" - {name}: {size:.2f} GB (VRAM: {vram:.2f} GB)")
```
## Generate Text
### Non-Streaming
```python
result = ollama.generate(
model="llama3.2:latest",
prompt="Why is the sky blue? Answer in one sentence."
)
print(result["response"])
```
### Streaming
```python
stream = ollama.generate(
model="llama3.2:latest",
prompt="Count from 1 to 5.",
stream=True
)
for chunk in stream:
print(chunk["response"], end="", flush=True)
```
## Chat Completion
### Si