Complete fal.ai optimization system. PROACTIVELY activate for: (1) Queue vs run performance, (2) Parallel request batching, (3) Streaming for real-time UI, (4) WebSocket for interactive apps, (5) Model cost comparison, (6) Image size optimization, (7) Inference step tuning, (8) Webhook vs polling, (9) Result caching by seed, (10) Serverless scaling config. Provides: Parallel patterns, cost strategies, caching examples, monitoring setup. Ensures optimal performance and cost-effective usage.
View on GitHubJosiahSiegel/claude-plugin-marketplace
fal-ai-master
January 20, 2026
Select agents to install to:
npx add-skill https://github.com/JosiahSiegel/claude-plugin-marketplace/blob/main/plugins/fal-ai-master/skills/fal-optimization/SKILL.md -a claude-code --skill fal-optimizationInstallation paths:
.claude/skills/fal-optimization/## Quick Reference
| Optimization | Technique | Impact |
|--------------|-----------|--------|
| Parallel requests | `Promise.all()` with batches | 5-10x throughput |
| Avoid polling | Use webhooks | Lower API calls |
| Cache by seed | Store `prompt+seed` results | Avoid regeneration |
| Right-size images | Use needed resolution | Lower cost |
| Fewer steps | Reduce inference steps | Faster, cheaper |
| Model Tier | Development | Production |
|------------|-------------|------------|
| Image | FLUX Schnell | FLUX.2 Pro |
| Video | Runway Turbo | Kling 2.6 Pro |
| Serverless Config | Cost-Optimized | Latency-Optimized |
|-------------------|----------------|-------------------|
| `min_concurrency` | `0` | `1+` |
| `keep_alive` | `120` | `600+` |
| `machine_type` | Smallest viable | Higher tier |
## When to Use This Skill
Use for **performance and cost optimization**:
- Reducing generation latency
- Lowering API costs
- Implementing parallel processing
- Choosing between polling and webhooks
- Configuring serverless scaling
**Related skills:**
- For API patterns: see `fal-api-reference`
- For model selection: see `fal-model-guide`
- For serverless config: see `fal-serverless-guide`
---
# fal.ai Performance and Cost Optimization
Strategies for optimizing performance, reducing costs, and scaling fal.ai integrations.
## Performance Optimization
### Client-Side Optimizations
#### 1. Use Queue-Based Execution
Always prefer `subscribe()` over `run()` for generation tasks:
```typescript
// Recommended: Queue-based with progress tracking
const result = await fal.subscribe("fal-ai/flux/dev", {
input: { prompt: "test" },
logs: true,
onQueueUpdate: (update) => {
// Show progress to users
if (update.status === "IN_PROGRESS") {
console.log("Generating...");
}
}
});
// Only use run() for fast endpoints (< 30s)
const quickResult = await fal.run("fal-ai/fast-sdxl", {
input: { prompt: "quick test" }
});
```
#### 2. Parallel Requests
Process