Back to Skills

finetuning

verified

Model fine-tuning with PyTorch and HuggingFace Trainer. Covers dataset preparation, tokenization, training loops, TrainingArguments, SFTTrainer for instruction tuning, evaluation, and checkpoint management. Includes Unsloth recommendations.

View on GitHub

Marketplace

bazzite-ai-plugins

atrawog/bazzite-ai-plugins

Plugin

bazzite-ai-jupyter

development

Repository

atrawog/bazzite-ai-plugins

bazzite-ai-jupyter/skills/finetuning/SKILL.md

Last Verified

January 21, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/atrawog/bazzite-ai-plugins/blob/main/bazzite-ai-jupyter/skills/finetuning/SKILL.md -a claude-code --skill finetuning

Installation paths:

Claude
.claude/skills/finetuning/
Powered by add-skill CLI

Instructions

# Model Fine-Tuning

## Overview

Fine-tuning adapts a pre-trained LLM to specific tasks by training on task-specific data. This skill covers both manual PyTorch training and HuggingFace's high-level Trainer API.

**Recommended**: For 2x faster training with less memory, use **Unsloth** (see `bazzite-ai-jupyter:sft`).

## Quick Reference

| Approach | Use Case | Speed |
|----------|----------|-------|
| **Unsloth + SFTTrainer** | **Recommended default** | **2x faster** |
| PyTorch Manual | Full control, custom training | Baseline |
| HuggingFace Trainer | Standard training, less code | Fast |
| SFTTrainer | Instruction/chat fine-tuning | Fast |

## Method Comparison

| Method | Learning Rate | Use Case |
|--------|---------------|----------|
| SFT | 2e-4 | Instruction tuning (first step) |
| GRPO | 1e-5 | RL with rewards |
| DPO | 5e-6 | Preference learning |
| RLOO | 1e-5 | RL with lower variance |
| Reward | 1e-5 | Reward model training |

## Unsloth Quickstart (Recommended)

```python
# CRITICAL: Import unsloth FIRST
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported
from trl import SFTTrainer, SFTConfig

# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen3-4B-Thinking-2507-unsloth-bnb-4bit",
    max_seq_length=1024,
    load_in_4bit=True,
)

# Apply LoRA
model = FastLanguageModel.get_peft_model(
    model, r=16, lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    use_gradient_checkpointing="unsloth",
)

# Train
trainer = SFTTrainer(
    model=model, tokenizer=tokenizer, train_dataset=dataset,
    args=SFTConfig(
        output_dir="./output",
        max_steps=100,
        learning_rate=2e-4,
        bf16=is_bf16_supported(),
        optim="adamw_8bit",
    ),
)
trainer.train()
```

See `bazzite-ai-jupyter:sft` for complete Unsloth patterns.

## Dataset Preparation

### Load from HuggingFace Hub

``

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
9560 chars