LLM fine-tuning with LoRA, QLoRA, DPO alignment, and synthetic data generation. Efficient training, preference learning, data creation. Use when customizing models for specific domains.
View on GitHubFebruary 4, 2026
Select agents to install to:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/plugins/ork/skills/fine-tuning-customization/SKILL.md -a claude-code --skill fine-tuning-customizationInstallation paths:
.claude/skills/fine-tuning-customization/# Fine-Tuning & Customization
Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.
> **Unsloth 2026**: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. **TRL**: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.
## Decision Framework: Fine-Tune or Not?
| Approach | Try First | When It Works |
|----------|-----------|---------------|
| Prompt Engineering | Always | Simple tasks, clear instructions |
| RAG | External knowledge needed | Knowledge-intensive tasks |
| Fine-Tuning | Last resort | Deep specialization, format control |
**Fine-tune ONLY when:**
1. Prompt engineering tried and insufficient
2. RAG doesn't capture domain nuances
3. Specific output format consistently required
4. Persona/style must be deeply embedded
5. You have ~1000+ high-quality examples
## LoRA vs QLoRA (Unsloth 2026)
| Criteria | LoRA | QLoRA |
|----------|------|-------|
| Model fits in VRAM | Use LoRA | |
| Memory constrained | | Use QLoRA |
| Training speed | 39% faster | |
| Memory savings | | 75%+ (dynamic 4-bit quants) |
| Quality | Baseline | ~Same (Unsloth recovered accuracy loss) |
| 70B LLaMA | | <48GB VRAM with QLoRA |
## Quick Reference: LoRA Training
```python
from unsloth import FastLanguageModel
from trl import SFTTrainer
# Load with 4-bit quantization (QLoRA)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Meta-Llama-3.1-8B",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16, # Rank (16-64 typical)
lora_alpha=32, # Scaling (2x r)
lora_dropout=0.05,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj", # Attention
"gate_proj", "up_proj", "down_proj", # MLP (QLoRA paper)
],
)
# Train
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
max_seq_length=2048,
)
trainer.train()
```