Training manager for RunPod GPU instances - configure pods, launch training, monitor progress, retrieve checkpoints
View on GitHubchrisvoncsefalvay/funsloth
funsloth
January 20, 2026
Select agents to install to:
npx add-skill https://github.com/chrisvoncsefalvay/funsloth/blob/main/skills/funsloth-runpod/SKILL.md -a claude-code --skill funsloth-runpodInstallation paths:
.claude/skills/funsloth-runpod/# RunPod Training Manager
Run Unsloth training on RunPod GPU instances.
## Prerequisites
1. **RunPod API Key**: `echo $RUNPOD_API_KEY` (get at runpod.io/console/user/settings)
2. **RunPod SDK**: `pip install runpod`
3. **Training notebook/script**: From `funsloth-train`
## Workflow
### 1. Select GPU
| GPU | VRAM | Cost | Best For |
|-----|------|------|----------|
| RTX 3090 | 24GB | ~$0.35/hr | Budget 7-14B |
| RTX 4090 | 24GB | ~$0.55/hr | Fast 7-14B |
| A100 40GB | 40GB | ~$1.50/hr | 14-34B |
| A100 80GB | 80GB | ~$2.00/hr | 70B |
| H100 | 80GB | ~$3.50/hr | Fastest |
RunPod typically has better prices than HF Jobs.
### 2. Choose Deployment
- **Pod** (Recommended): Persistent, SSH access, network storage
- **Serverless**: Pay per second, complex setup (better for inference)
### 3. Configure Network Volume (Recommended)
```python
import runpod
volume = runpod.create_network_volume(name="funsloth-training", size_gb=50, region="US")
```
Allows: resume training, download checkpoints, share between pods.
### 4. Launch Pod
Use the [official Unsloth Docker image](https://docs.unsloth.ai/new/how-to-fine-tune-llms-with-unsloth-and-docker) for a pre-configured environment:
```python
import runpod
pod = runpod.create_pod(
name="funsloth-training",
image_name="unsloth/unsloth", # Official image, supports all GPUs incl. Blackwell
gpu_type_id="{gpu_type}",
volume_in_gb=50,
network_volume_id="{volume_id}",
env={
"HF_TOKEN": "{token}",
"WANDB_API_KEY": "{key}",
"JUPYTER_PASSWORD": "unsloth",
},
ports="8888/http,22/tcp",
)
```
The Unsloth image includes Jupyter Lab (port 8888) and example notebooks in `/workspace/unsloth-notebooks/`.
### 5. Upload and Run
```bash
# SSH into pod
ssh root@{pod_ip}
# Upload script
scp train.py root@{pod_ip}:/workspace/
# Run training (use tmux for persistence)
tmux new -s training
cd /workspace && python train.py
# Ctrl+B, D to detach
```
### 6. Monitor
```bash
# SSH mon