Complete fal.ai audio system. PROACTIVELY activate for: (1) Whisper speech-to-text, (2) Transcription with timestamps, (3) Translation to English, (4) F5-TTS voice cloning, (5) ElevenLabs premium TTS, (6) Kokoro multi-language TTS, (7) XTTS open-source cloning, (8) Subtitle generation (SRT), (9) Audio file formats. Provides: STT/TTS endpoints, language codes, voice cloning setup, timestamp formatting. Ensures accurate transcription and natural speech synthesis.
View on GitHubJosiahSiegel/claude-plugin-marketplace
fal-ai-master
January 20, 2026
Select agents to install to:
npx add-skill https://github.com/JosiahSiegel/claude-plugin-marketplace/blob/main/plugins/fal-ai-master/skills/fal-audio/SKILL.md -a claude-code --skill fal-audioInstallation paths:
.claude/skills/fal-audio/## Quick Reference
| STT Model | Endpoint | Speed | Accuracy |
|-----------|----------|-------|----------|
| Whisper | `fal-ai/whisper` | Medium | Highest |
| Whisper Turbo | `fal-ai/whisper-turbo` | Fast | High |
| Whisper Large v3 | `fal-ai/whisper-large-v3` | Slow | Highest |
| TTS Model | Endpoint | Voice Clone | Quality |
|-----------|----------|-------------|---------|
| F5-TTS | `fal-ai/f5-tts` | Yes | High |
| ElevenLabs | `fal-ai/elevenlabs/tts` | Via API | Highest |
| Kokoro | `fal-ai/kokoro/american-english` | No | Good |
| XTTS | `fal-ai/xtts` | Yes | Good |
| Whisper Task | Use Case |
|--------------|----------|
| `transcribe` | Same language text |
| `translate` | Non-English → English |
| Whisper Parameter | Value |
|-------------------|-------|
| `chunk_level` | `"segment"` for timestamps |
| `language` | ISO code (e.g., `"en"`) |
## When to Use This Skill
Use for **audio processing**:
- Transcribing audio/video to text
- Generating subtitles with timestamps
- Translating speech to English
- Cloning voices from reference audio
- Generating natural speech from text
**Related skills:**
- For video with audio: see `fal-text-to-video`
- For API integration: see `fal-api-reference`
- For model comparison: see `fal-model-guide`
---
# fal.ai Audio Models
Complete reference for speech-to-text (STT) and text-to-speech (TTS) models on fal.ai.
## Speech-to-Text Models
### Whisper (OpenAI)
**Endpoint:** `fal-ai/whisper`
**Best For:** Accurate transcription and translation
The industry-standard speech recognition model with support for 99+ languages.
```typescript
import { fal } from "@fal-ai/client";
const result = await fal.subscribe("fal-ai/whisper", {
input: {
audio_url: "https://example.com/speech.mp3",
task: "transcribe",
language: "en",
chunk_level: "segment"
}
});
console.log(result.text);
console.log(result.chunks); // With timestamps
```
```python
import fal_client
result = fal_client.subscribe(
"fal-ai/whisper