Back to Skills

vision

verified

Vision model fine-tuning with FastVisionModel. Covers Pixtral, Ministral VL training, UnslothVisionDataCollator, image+text datasets, and vision-specific LoRA configuration.

View on GitHub

Marketplace

bazzite-ai-plugins

atrawog/bazzite-ai-plugins

Plugin

bazzite-ai-jupyter

development

Repository

atrawog/bazzite-ai-plugins

bazzite-ai-jupyter/skills/vision/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/atrawog/bazzite-ai-plugins/blob/main/bazzite-ai-jupyter/skills/vision/SKILL.md -a claude-code --skill vision

Installation paths:

Claude
.claude/skills/vision/
Powered by add-skill CLI

Instructions

# Vision Model Fine-Tuning

## Overview

Unsloth provides `FastVisionModel` for fine-tuning vision-language models (VLMs) like Pixtral and Ministral with 2x faster training. This skill covers vision model loading, dataset preparation with images, and vision-specific LoRA configuration.

## Quick Reference

| Component | Purpose |
|-----------|---------|
| `FastVisionModel` | Load vision models with Unsloth optimizations |
| `UnslothVisionDataCollator` | Handle image+text modality in batches |
| `finetune_vision_layers` | Enable training of vision encoder |
| `finetune_language_layers` | Enable training of language model |
| `skip_prepare_dataset=True` | Required for vision datasets |
| `dataset_text_field=""` | Empty string for vision (not a field name) |
| List dataset format | Use `[convert(s) for s in dataset]`, not `.map()` |

## Critical Environment Setup

```python
import os
from dotenv import load_dotenv
load_dotenv()

# Force text-based progress in Jupyter
os.environ["TQDM_NOTEBOOK"] = "false"
```

## Critical Import Order

```python
# CRITICAL: Import unsloth FIRST for proper TRL patching
import unsloth
from unsloth import FastVisionModel, is_bf16_supported
from unsloth.trainer import UnslothVisionDataCollator

from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
import torch
```

## Supported Vision Models

| Model | Path | Parameters | Best For |
|-------|------|------------|----------|
| Pixtral-12B | `unsloth/pixtral-12b-2409-bnb-4bit` | 12.7B | High-quality vision tasks |
| Ministral-8B-Vision | `unsloth/Ministral-8B-Vision-2507-bnb-4bit` | 8B | Balanced quality/speed |
| Llama-3.2-11B-Vision | `unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit` | 11B | General vision tasks |

## Load Vision Model

```python
from unsloth import FastVisionModel, is_bf16_supported

model, tokenizer = FastVisionModel.from_pretrained(
    "unsloth/pixtral-12b-2409-bnb-4bit",
    load_in_4bit=True,
    use_gradient_checkpointing="unsloth",
)

print(f"Model l

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
13610 chars