# Episode Transcription Script This script transcribes video episodes with speaker diarization and infers speaker names using AI. ## Features - ✅ Transcribes all `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm` files in `episodes/` folder - ✅ Speaker diarization (identifies who spoke when) - ✅ AI-powered speaker naming based on context - ✅ Smart merging of non-word utterances (sounds, modal particles) - ✅ Progress tracking - resume from where you left off - ✅ Output format: `[mm:ss](SpeakerName) line content` ## Setup ### 1. Install uv (if not already installed) ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` ### 2. Set API Keys ```bash # Required: AssemblyAI API key (free tier: 100 hours/month) export ASSEMBLYAI_API_KEY="your-assemblyai-key" # Required: OpenAI/Kimi API key export OPENAI_API_KEY="your-kimi-key" # Optional: If using Kimi (already set as default in script) export OPENAI_BASE_URL="https://api.moonshot.cn/v1" ``` Get your API keys: - AssemblyAI: https://www.assemblyai.com/ (free tier available) - Kimi: https://platform.moonshot.cn/ ## Usage ### Run with uv (recommended) ```bash # This will automatically install dependencies and run the script uv run transcribe_episodes.py ``` ### Or sync dependencies first, then run ```bash # Install dependencies (creates .venv automatically) uv sync # Run the script uv run python transcribe_episodes.py ``` ### Check progress ```bash uv run transcribe_episodes.py status ``` ### Reset and re-process ```bash # Reset all (will re-process everything) uv run transcribe_episodes.py reset # Reset specific file only uv run transcribe_episodes.py reset S02E02.mp4 ``` ## Output Transcripts are saved to `transcripts/` folder as `.txt` files: ``` transcripts/ ├── S02E01.txt └── S02E02.txt ``` Example content: ``` [00:12](Malabar) Hello everyone, welcome back! [00:15](Sun) Nice to see you all again. [00:18](Jupiter) Yeah, let's get started. ``` ## Progress Tracking The script creates `.transcription_progress.json` to track which files are: - `completed` - Successfully processed - `error` - Failed (check error message) - `transcribing` - In progress (transcription) - `naming` - In progress (speaker naming) If interrupted, simply re-run the script - it will skip completed files. ## How Speaker Naming Works 1. Transcribe with AssemblyAI to get speaker labels (A, B, C...) 2. Sample utterances from each speaker 3. Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole 4. LLM infers which speaker is which character based on speaking style and content 5. Apply inferred names to output ## Troubleshooting **AssemblyAI upload fails:** - Check your API key - Check internet connection - Video files might be too large for free tier **Speaker naming is wrong:** - The LLM makes educated guesses based on context - You can manually edit the output files if needed - Consider providing more context about each character's personality **Progress lost:** - Don't delete `.transcription_progress.json` - It tracks which files are done to avoid re-processing ## Development ```bash # Add new dependencies uv add # Add dev dependencies uv add --dev # Update lock file uv lock ```