CRITICAL: Use for performance optimization. Triggers: performance, optimization, benchmark, profiling, flamegraph, criterion, slow, fast, allocation, cache, SIMD, make it faster, 性能优化, 基准测试
View on GitHubZhangHanDong/rust-skills
rust-skills
January 22, 2026
Select agents to install to:
npx add-skill https://github.com/ZhangHanDong/rust-skills/blob/main/skills/m10-performance/SKILL.md -a claude-code --skill m10-performanceInstallation paths:
.claude/skills/m10-performance/# Performance Optimization
> **Layer 2: Design Choices**
## Core Question
**What's the bottleneck, and is optimization worth it?**
Before optimizing:
- Have you measured? (Don't guess)
- What's the acceptable performance?
- Will optimization add complexity?
---
## Performance Decision → Implementation
| Goal | Design Choice | Implementation |
|------|---------------|----------------|
| Reduce allocations | Pre-allocate, reuse | `with_capacity`, object pools |
| Improve cache | Contiguous data | `Vec`, `SmallVec` |
| Parallelize | Data parallelism | `rayon`, threads |
| Avoid copies | Zero-copy | References, `Cow<T>` |
| Reduce indirection | Inline data | `smallvec`, arrays |
---
## Thinking Prompt
Before optimizing:
1. **Have you measured?**
- Profile first → flamegraph, perf
- Benchmark → criterion, cargo bench
- Identify actual hotspots
2. **What's the priority?**
- Algorithm (10x-1000x improvement)
- Data structure (2x-10x)
- Allocation (2x-5x)
- Cache (1.5x-3x)
3. **What's the trade-off?**
- Complexity vs speed
- Memory vs CPU
- Latency vs throughput
---
## Trace Up ↑
To domain constraints (Layer 3):
```
"How fast does this need to be?"
↑ Ask: What's the performance SLA?
↑ Check: domain-* (latency requirements)
↑ Check: Business requirements (acceptable response time)
```
| Question | Trace To | Ask |
|----------|----------|-----|
| Latency requirements | domain-* | What's acceptable response time? |
| Throughput needs | domain-* | How many requests per second? |
| Memory constraints | domain-* | What's the memory budget? |
---
## Trace Down ↓
To implementation (Layer 1):
```
"Need to reduce allocations"
↓ m01-ownership: Use references, avoid clone
↓ m02-resource: Pre-allocate with_capacity
"Need to parallelize"
↓ m07-concurrency: Choose rayon or threads
↓ m07-concurrency: Consider async for I/O-bound
"Need cache efficiency"
↓ Data layout: Prefer Vec over HashMap when possible