Audio/video transcription using OpenAI Whisper. Covers installation, model selection, transcript formats (SRT, VTT, JSON), timing synchronization, and speaker diarization. Use when transcribing media or generating subtitles.
View on GitHubFebruary 5, 2026
Select agents to install to:
npx add-skill https://github.com/MadAppGang/claude-code/blob/main/plugins/video-editing/skills/transcription/SKILL.md -a claude-code --skill transcriptionInstallation paths:
.claude/skills/transcription/plugin: video-editing updated: 2026-01-20 # Transcription with Whisper Production-ready patterns for audio/video transcription using OpenAI Whisper. ## System Requirements ### Installation Options **Option 1: OpenAI Whisper (Python)** ```bash # macOS/Linux/Windows pip install openai-whisper # Verify whisper --help ``` **Option 2: whisper.cpp (C++ - faster)** ```bash # macOS brew install whisper-cpp # Linux - build from source git clone https://github.com/ggerganov/whisper.cpp cd whisper.cpp && make # Windows - use pre-built binaries or build with cmake ``` **Option 3: Insanely Fast Whisper (GPU accelerated)** ```bash pip install insanely-fast-whisper ``` ### Model Selection | Model | Size | VRAM | Accuracy | Speed | Use Case | |-------|------|------|----------|-------|----------| | tiny | 39M | ~1GB | Low | Fastest | Quick previews | | base | 74M | ~1GB | Medium | Fast | Draft transcripts | | small | 244M | ~2GB | Good | Medium | General use | | medium | 769M | ~5GB | Better | Slow | Quality transcripts | | large-v3 | 1550M | ~10GB | Best | Slowest | Final production | **Recommendation:** Start with `small` for speed/quality balance. Use `large-v3` for final delivery. ## Basic Transcription ### Using OpenAI Whisper ```bash # Basic transcription (auto-detect language) whisper audio.mp3 --model small # Specify language and output format whisper audio.mp3 --model medium --language en --output_format srt # Multiple output formats whisper audio.mp3 --model small --output_format all # With timestamps and word-level timing whisper audio.mp3 --model small --word_timestamps True ``` ### Using whisper.cpp ```bash # Download model first ./models/download-ggml-model.sh base.en # Transcribe ./main -m models/ggml-base.en.bin -f audio.wav -osrt # With timestamps ./main -m models/ggml-base.en.bin -f audio.wav -ocsv ``` ## Output Formats ### SRT (SubRip Subtitle) ``` 1 00:00:01,000 --> 00:00:04,500 Hello and welcome to this video. 2 00:00:05,000 --> 00:00:0