Back to Skills

python

verified

Official ollama Python library for LLM inference. Provides a clean, Pythonic interface for text generation, chat completion, embeddings, model management, and streaming responses.

View on GitHub

Marketplace

bazzite-ai-plugins

atrawog/bazzite-ai-plugins

Plugin

bazzite-ai-ollama

development

Repository

atrawog/bazzite-ai-plugins

bazzite-ai-ollama/skills/python/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/atrawog/bazzite-ai-plugins/blob/main/bazzite-ai-ollama/skills/python/SKILL.md -a claude-code --skill python

Installation paths:

Claude
.claude/skills/python/
Powered by add-skill CLI

Instructions

# Ollama Python Library

## Overview

The official `ollama` Python library provides a clean, Pythonic interface to all Ollama functionality. It automatically connects to the Ollama server and handles serialization.

## Quick Reference

| Function | Purpose |
|----------|---------|
| `ollama.list()` | List available models |
| `ollama.show()` | Show model details |
| `ollama.ps()` | List running models |
| `ollama.generate()` | Generate text |
| `ollama.chat()` | Chat completion |
| `ollama.embed()` | Generate embeddings |
| `ollama.copy()` | Copy a model |
| `ollama.delete()` | Delete a model |
| `ollama.pull()` | Pull a model |

## Setup

```python
import ollama

# The library automatically uses OLLAMA_HOST environment variable
# Default: http://localhost:11434
```

## List Models

```python
models = ollama.list()

for model in models.get("models", []):
    size_gb = model.get("size", 0) / (1024**3)
    print(f"  - {model['model']} ({size_gb:.2f} GB)")
```

## Show Model Details

```python
model_info = ollama.show("llama3.2:latest")

details = model_info.get("details", {})
print(f"Family: {details.get('family', 'N/A')}")
print(f"Parameter Size: {details.get('parameter_size', 'N/A')}")
print(f"Quantization: {details.get('quantization_level', 'N/A')}")
```

## List Running Models

```python
running = ollama.ps()

for model in running.get("models", []):
    name = model.get("name", "Unknown")
    size = model.get("size", 0) / (1024**3)
    vram = model.get("size_vram", 0) / (1024**3)
    print(f"  - {name}: {size:.2f} GB (VRAM: {vram:.2f} GB)")
```

## Generate Text

### Non-Streaming

```python
result = ollama.generate(
    model="llama3.2:latest",
    prompt="Why is the sky blue? Answer in one sentence."
)
print(result["response"])
```

### Streaming

```python
stream = ollama.generate(
    model="llama3.2:latest",
    prompt="Count from 1 to 5.",
    stream=True
)

for chunk in stream:
    print(chunk["response"], end="", flush=True)
```

## Chat Completion

### Si

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
5859 chars