3.2 KiB
3.2 KiB
Episode Transcription Script
This script transcribes video episodes with speaker diarization and infers speaker names using AI.
Features
- ✅ Transcribes all
.mp4,.mkv,.avi,.mov,.webmfiles inepisodes/folder - ✅ Speaker diarization (identifies who spoke when)
- ✅ AI-powered speaker naming based on context
- ✅ Smart merging of non-word utterances (sounds, modal particles)
- ✅ Progress tracking - resume from where you left off
- ✅ Output format:
[mm:ss](SpeakerName) line content
Setup
1. Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
2. Set API Keys
# Required: AssemblyAI API key (free tier: 100 hours/month)
export ASSEMBLYAI_API_KEY="your-assemblyai-key"
# Required: OpenAI/Kimi API key
export OPENAI_API_KEY="your-kimi-key"
# Optional: If using Kimi (already set as default in script)
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"
Get your API keys:
- AssemblyAI: https://www.assemblyai.com/ (free tier available)
- Kimi: https://platform.moonshot.cn/
Usage
Run with uv (recommended)
# This will automatically install dependencies and run the script
uv run transcribe_episodes.py
Or sync dependencies first, then run
# Install dependencies (creates .venv automatically)
uv sync
# Run the script
uv run python transcribe_episodes.py
Check progress
uv run transcribe_episodes.py status
Reset and re-process
# Reset all (will re-process everything)
uv run transcribe_episodes.py reset
# Reset specific file only
uv run transcribe_episodes.py reset S02E02.mp4
Output
Transcripts are saved to transcripts/ folder as .txt files:
transcripts/
├── S02E01.txt
└── S02E02.txt
Example content:
[00:12](Malabar) Hello everyone, welcome back!
[00:15](Sun) Nice to see you all again.
[00:18](Jupiter) Yeah, let's get started.
Progress Tracking
The script creates .transcription_progress.json to track which files are:
completed- Successfully processederror- Failed (check error message)transcribing- In progress (transcription)naming- In progress (speaker naming)
If interrupted, simply re-run the script - it will skip completed files.
How Speaker Naming Works
- Transcribe with AssemblyAI to get speaker labels (A, B, C...)
- Sample utterances from each speaker
- Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole
- LLM infers which speaker is which character based on speaking style and content
- Apply inferred names to output
Troubleshooting
AssemblyAI upload fails:
- Check your API key
- Check internet connection
- Video files might be too large for free tier
Speaker naming is wrong:
- The LLM makes educated guesses based on context
- You can manually edit the output files if needed
- Consider providing more context about each character's personality
Progress lost:
- Don't delete
.transcription_progress.json - It tracks which files are done to avoid re-processing
Development
# Add new dependencies
uv add <package-name>
# Add dev dependencies
uv add --dev <package-name>
# Update lock file
uv lock