Back to Skills

fine-tuning-customization

verified

LLM fine-tuning with LoRA, QLoRA, DPO alignment, and synthetic data generation. Efficient training, preference learning, data creation. Use when customizing models for specific domains.

View on GitHub

Marketplace

orchestkit

yonatangross/skillforge-claude-plugin

Plugin

ork-llm-advanced

ai

Repository

yonatangross/skillforge-claude-plugin
33stars

plugins/ork-llm-advanced/skills/fine-tuning-customization/SKILL.md

Last Verified

January 25, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/yonatangross/skillforge-claude-plugin/blob/main/plugins/ork-llm-advanced/skills/fine-tuning-customization/SKILL.md -a claude-code --skill fine-tuning-customization

Installation paths:

Claude
.claude/skills/fine-tuning-customization/
Powered by add-skill CLI

Instructions

# Fine-Tuning & Customization

Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.

> **Unsloth 2026**: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. **TRL**: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.

## Decision Framework: Fine-Tune or Not?

| Approach | Try First | When It Works |
|----------|-----------|---------------|
| Prompt Engineering | Always | Simple tasks, clear instructions |
| RAG | External knowledge needed | Knowledge-intensive tasks |
| Fine-Tuning | Last resort | Deep specialization, format control |

**Fine-tune ONLY when:**
1. Prompt engineering tried and insufficient
2. RAG doesn't capture domain nuances
3. Specific output format consistently required
4. Persona/style must be deeply embedded
5. You have ~1000+ high-quality examples

## LoRA vs QLoRA (Unsloth 2026)

| Criteria | LoRA | QLoRA |
|----------|------|-------|
| Model fits in VRAM | Use LoRA | |
| Memory constrained | | Use QLoRA |
| Training speed | 39% faster | |
| Memory savings | | 75%+ (dynamic 4-bit quants) |
| Quality | Baseline | ~Same (Unsloth recovered accuracy loss) |
| 70B LLaMA | | <48GB VRAM with QLoRA |

## Quick Reference: LoRA Training

```python
from unsloth import FastLanguageModel
from trl import SFTTrainer

# Load with 4-bit quantization (QLoRA)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Meta-Llama-3.1-8B",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,              # Rank (16-64 typical)
    lora_alpha=32,     # Scaling (2x r)
    lora_dropout=0.05,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",  # Attention
        "gate_proj", "up_proj", "down_proj",      # MLP (QLoRA paper)
    ],
)

# Train
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    max_seq_length=2048,
)
trainer.train()
```

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
5166 chars