finetuning

# Model Fine-Tuning

## Overview

Fine-tuning adapts a pre-trained LLM to specific tasks by training on task-specific data. This skill covers both manual PyTorch training and HuggingFace's high-level Trainer API.

**Recommended**: For 2x faster training with less memory, use **Unsloth** (see `bazzite-ai-jupyter:sft`).

## Quick Reference

| Approach | Use Case | Speed |
|----------|----------|-------|
| **Unsloth + SFTTrainer** | **Recommended default** | **2x faster** |
| PyTorch Manual | Full control, custom training | Baseline |
| HuggingFace Trainer | Standard training, less code | Fast |
| SFTTrainer | Instruction/chat fine-tuning | Fast |

## Method Comparison

| Method | Learning Rate | Use Case |
|--------|---------------|----------|
| SFT | 2e-4 | Instruction tuning (first step) |
| GRPO | 1e-5 | RL with rewards |
| DPO | 5e-6 | Preference learning |
| RLOO | 1e-5 | RL with lower variance |
| Reward | 1e-5 | Reward model training |

## Unsloth Quickstart (Recommended)

```python
# CRITICAL: Import unsloth FIRST
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported
from trl import SFTTrainer, SFTConfig

# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen3-4B-Thinking-2507-unsloth-bnb-4bit",
    max_seq_length=1024,
    load_in_4bit=True,
)

# Apply LoRA
model = FastLanguageModel.get_peft_model(
    model, r=16, lora_alpha=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                    "gate_proj", "up_proj", "down_proj"],
    use_gradient_checkpointing="unsloth",
)

# Train
trainer = SFTTrainer(
    model=model, tokenizer=tokenizer, train_dataset=dataset,
    args=SFTConfig(
        output_dir="./output",
        max_steps=100,
        learning_rate=2e-4,
        bf16=is_bf16_supported(),
        optim="adamw_8bit",
    ),
)
trainer.train()
```

See `bazzite-ai-jupyter:sft` for complete Unsloth patterns.

## Dataset Preparation

### Load from HuggingFace Hub

``
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details