cuda-optimization

# CUDA Optimization Skill

This skill helps optimize CUDA and GPU code for better performance.

## When I activate

I automatically activate when you:

- Review CUDA kernel code (`.cu` files)
- Ask about GPU performance or optimization
- Mention memory coalescing, occupancy, or shared memory
- Request profiling analysis or bottleneck identification
- Discuss parallel algorithm efficiency

## What I do

### Performance Analysis

I analyze CUDA code for:

- **Memory access patterns**: Detect uncoalesced accesses, bank conflicts
- **Thread configuration**: Evaluate block size, grid size, occupancy
- **Synchronization**: Check for unnecessary synchronization points
- **Memory hierarchy**: Assess use of shared memory, constant memory, texture memory
- **Warp efficiency**: Identify divergence and suboptimal thread utilization

### Optimization Suggestions

I provide specific recommendations:

- Coalescing memory accesses
- Optimal thread block configurations
- Shared memory usage patterns
- Reduction strategies
- Stream parallelism opportunities
- Compute vs memory-bound analysis

### Code Examples

I show before/after code examples demonstrating:

```cuda
// ❌ Uncoalesced access
__global__ void slow(float* data, int stride) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    data[idx * stride] = idx;  // Poor pattern
}

// ✅ Coalesced access
__global__ void fast(float* data) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    data[idx] = idx;  // Sequential pattern
}
```

## Tools I use

I leverage:

- NVIDIA Nsight Copilot (GPT-OSS-120B) for deep CUDA expertise
- Static code analysis for pattern detection
- Architecture-specific optimization knowledge (Ampere, Hopper, Ada)
- Best practices from CUDA Programming Guide

## Output format

My suggestions include:

1. **Issue identification** with line numbers
2. **Performance impact** estimate (low/medium/high)
3. **Specific fix** with code example
4. **Architecture notes** if relevant

Focus on actionable, me
Marketplace

Plugin

Repository

Last Verified

Install Skill

Instructions

Validation Details