Back to Skills

transcription

verified

Audio/video transcription using OpenAI Whisper. Covers installation, model selection, transcript formats (SRT, VTT, JSON), timing synchronization, and speaker diarization. Use when transcribing media or generating subtitles.

View on GitHub

Marketplace

mag-claude-plugins

MadAppGang/claude-code

Plugin

video-editing

media

Repository
Verified Org

MadAppGang/claude-code
215stars

plugins/video-editing/skills/transcription/SKILL.md

Last Verified

January 23, 2026

Install Skill

Select agents to install to:

Scope:
npx add-skill https://github.com/MadAppGang/claude-code/blob/main/plugins/video-editing/skills/transcription/SKILL.md -a claude-code --skill transcription

Installation paths:

Claude
.claude/skills/transcription/
Powered by add-skill CLI

Instructions

plugin: video-editing
updated: 2026-01-20

# Transcription with Whisper

Production-ready patterns for audio/video transcription using OpenAI Whisper.

## System Requirements

### Installation Options

**Option 1: OpenAI Whisper (Python)**
```bash
# macOS/Linux/Windows
pip install openai-whisper

# Verify
whisper --help
```

**Option 2: whisper.cpp (C++ - faster)**
```bash
# macOS
brew install whisper-cpp

# Linux - build from source
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp && make

# Windows - use pre-built binaries or build with cmake
```

**Option 3: Insanely Fast Whisper (GPU accelerated)**
```bash
pip install insanely-fast-whisper
```

### Model Selection

| Model | Size | VRAM | Accuracy | Speed | Use Case |
|-------|------|------|----------|-------|----------|
| tiny | 39M | ~1GB | Low | Fastest | Quick previews |
| base | 74M | ~1GB | Medium | Fast | Draft transcripts |
| small | 244M | ~2GB | Good | Medium | General use |
| medium | 769M | ~5GB | Better | Slow | Quality transcripts |
| large-v3 | 1550M | ~10GB | Best | Slowest | Final production |

**Recommendation:** Start with `small` for speed/quality balance. Use `large-v3` for final delivery.

## Basic Transcription

### Using OpenAI Whisper

```bash
# Basic transcription (auto-detect language)
whisper audio.mp3 --model small

# Specify language and output format
whisper audio.mp3 --model medium --language en --output_format srt

# Multiple output formats
whisper audio.mp3 --model small --output_format all

# With timestamps and word-level timing
whisper audio.mp3 --model small --word_timestamps True
```

### Using whisper.cpp

```bash
# Download model first
./models/download-ggml-model.sh base.en

# Transcribe
./main -m models/ggml-base.en.bin -f audio.wav -osrt

# With timestamps
./main -m models/ggml-base.en.bin -f audio.wav -ocsv
```

## Output Formats

### SRT (SubRip Subtitle)
```
1
00:00:01,000 --> 00:00:04,500
Hello and welcome to this video.

2
00:00:05,000 --> 00:00:0

Validation Details

Front Matter
Required Fields
Valid Name Format
Valid Description
Has Sections
Allowed Tools
Instruction Length:
6065 chars