gpu-optimization

# GPU Optimization

This skill provides comprehensive guidance for optimizing GPU training efficiency and handling large models.

## When to Activate

- Training runs out of GPU memory
- Need to scale training to multiple GPUs
- Optimizing training throughput
- Implementing model parallelism
- Using DeepSpeed or FSDP

## Memory Optimization Techniques

### 1. Mixed Precision Training

```python
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

for batch in dataloader:
    optimizer.zero_grad()

    # Forward pass in fp16/bf16
    with autocast(dtype=torch.bfloat16):
        outputs = model(batch["input"])
        loss = criterion(outputs, batch["target"])

    # Backward pass with scaling
    scaler.scale(loss).backward()

    # Unscale for gradient clipping
    scaler.unscale_(optimizer)
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)

    # Optimizer step
    scaler.step(optimizer)
    scaler.update()
```

### 2. Gradient Checkpointing

```python
from torch.utils.checkpoint import checkpoint, checkpoint_sequential

class CheckpointedTransformer(nn.Module):
    def __init__(self, num_layers: int, dim: int):
        super().__init__()
        self.layers = nn.ModuleList([
            TransformerBlock(dim) for _ in range(num_layers)
        ])
        self.gradient_checkpointing = False

    def enable_gradient_checkpointing(self):
        self.gradient_checkpointing = True

    def forward(self, x):
        if self.gradient_checkpointing and self.training:
            # Checkpoint every layer
            for layer in self.layers:
                x = checkpoint(layer, x, use_reentrant=False)
        else:
            for layer in self.layers:
                x = layer(x)
        return x

# Or checkpoint sequential blocks
x = checkpoint_sequential(self.layers, segments=4, input=x)
```

### 3. Gradient Accumulation

```python
accumulation_steps = 4
optimizer.zero_grad()

for i, batch in enumerate(dataloader):
    # Forward pa
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details