Advanced QLoRA experiments and comparisons. Covers alpha scaling, LoRA rank selection, target module strategies, continual learning, multi-adapter hot-swapping, and quantization comparison (4-bit vs BF16).
View on GitHubatrawog/bazzite-ai-plugins
bazzite-ai-jupyter
bazzite-ai-jupyter/skills/qlora/SKILL.md
January 21, 2026
Select agents to install to:
npx add-skill https://github.com/atrawog/bazzite-ai-plugins/blob/main/bazzite-ai-jupyter/skills/qlora/SKILL.md -a claude-code --skill qloraInstallation paths:
.claude/skills/qlora/# Advanced QLoRA Experiments
## Overview
This skill covers advanced QLoRA experimentation patterns for optimizing fine-tuning performance. Learn how to select the best LoRA rank, alpha scaling, target modules, and quantization settings for your specific use case.
## Quick Reference
| Topic | Key Finding |
|-------|-------------|
| **Rank (r)** | r=16 is optimal balance; r=8 for memory constrained |
| **Alpha** | alpha=r (1.0x scaling) is standard; alpha=2r for aggressive |
| **Target Modules** | all_linear for general; mlp_only for knowledge injection |
| **Quantization** | 4-bit NF4 matches BF16 quality with 11-15% memory savings |
| **Continual Learning** | Sequential training adds knowledge without forgetting |
| Token ID 151668 | `</think>` boundary for Qwen3-Thinking models |
## Critical Environment Setup
```python
import os
from dotenv import load_dotenv
load_dotenv()
# Force text-based progress in Jupyter
os.environ["TQDM_NOTEBOOK"] = "false"
# CRITICAL: Import unsloth FIRST
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported
```
## Alpha Scaling
### Formula
The effective LoRA scaling factor is:
```
scaling_factor = alpha / r
```
This acts as a learning rate multiplier for adapter weights.
### Alpha Comparison Code
```python
import unsloth
from unsloth import FastLanguageModel, is_bf16_supported
from trl import SFTTrainer, SFTConfig
from transformers import TrainerCallback
ALPHAS = [8, 16, 32, 64]
FIXED_RANK = 16
results = []
for alpha in ALPHAS:
scaling_factor = alpha / FIXED_RANK
print(f"\n=== Testing alpha={alpha} (scaling={scaling_factor}x) ===")
# Load fresh model
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Qwen3-4B-Thinking-2507-unsloth-bnb-4bit",
max_seq_length=512,
load_in_4bit=True,
)
# Apply LoRA with specific alpha
model = FastLanguageModel.get_peft_model(
model,
r=FIXED_RANK,
lora_alpha=alpha, # Variable alpha