Back to Skills

fal-audio

verified

Complete fal.ai audio system. PROACTIVELY activate for: (1) Whisper speech-to-text, (2) Transcription with timestamps, (3) Translation to English, (4) F5-TTS voice cloning, (5) ElevenLabs premium TTS, (6) Kokoro multi-language TTS, (7) XTTS open-source cloning, (8) Subtitle generation (SRT), (9) Audio file formats. Provides: STT/TTS endpoints, language codes, voice cloning setup, timestamp formatting. Ensures accurate transcription and natural speech synthesis.

View on GitHub

Marketplace

claude-plugin-marketplace

JosiahSiegel/claude-plugin-marketplace

Plugin

fal-ai-master

Repository

JosiahSiegel/claude-plugin-marketplace
7stars

plugins/fal-ai-master/skills/fal-audio/SKILL.md

Last Verified

January 20, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/JosiahSiegel/claude-plugin-marketplace/blob/main/plugins/fal-ai-master/skills/fal-audio/SKILL.md -a claude-code --skill fal-audio

Installation paths:

Claude
.claude/skills/fal-audio/
Powered by add-skill CLI

Instructions

## Quick Reference

| STT Model | Endpoint | Speed | Accuracy |
|-----------|----------|-------|----------|
| Whisper | `fal-ai/whisper` | Medium | Highest |
| Whisper Turbo | `fal-ai/whisper-turbo` | Fast | High |
| Whisper Large v3 | `fal-ai/whisper-large-v3` | Slow | Highest |

| TTS Model | Endpoint | Voice Clone | Quality |
|-----------|----------|-------------|---------|
| F5-TTS | `fal-ai/f5-tts` | Yes | High |
| ElevenLabs | `fal-ai/elevenlabs/tts` | Via API | Highest |
| Kokoro | `fal-ai/kokoro/american-english` | No | Good |
| XTTS | `fal-ai/xtts` | Yes | Good |

| Whisper Task | Use Case |
|--------------|----------|
| `transcribe` | Same language text |
| `translate` | Non-English → English |

| Whisper Parameter | Value |
|-------------------|-------|
| `chunk_level` | `"segment"` for timestamps |
| `language` | ISO code (e.g., `"en"`) |

## When to Use This Skill

Use for **audio processing**:
- Transcribing audio/video to text
- Generating subtitles with timestamps
- Translating speech to English
- Cloning voices from reference audio
- Generating natural speech from text

**Related skills:**
- For video with audio: see `fal-text-to-video`
- For API integration: see `fal-api-reference`
- For model comparison: see `fal-model-guide`

---

# fal.ai Audio Models

Complete reference for speech-to-text (STT) and text-to-speech (TTS) models on fal.ai.

## Speech-to-Text Models

### Whisper (OpenAI)
**Endpoint:** `fal-ai/whisper`
**Best For:** Accurate transcription and translation

The industry-standard speech recognition model with support for 99+ languages.

```typescript
import { fal } from "@fal-ai/client";

const result = await fal.subscribe("fal-ai/whisper", {
  input: {
    audio_url: "https://example.com/speech.mp3",
    task: "transcribe",
    language: "en",
    chunk_level: "segment"
  }
});

console.log(result.text);
console.log(result.chunks);  // With timestamps
```

```python
import fal_client

result = fal_client.subscribe(
    "fal-ai/whisper

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
13793 chars