a

2026-03-03 17:20:29 +08:00
commit 0f8513c7ad
25 changed files with 11922 additions and 0 deletions
--- a/README_TRANSCRIBE.md
+++ b/README_TRANSCRIBE.md
@@ -0,0 +1,136 @@
+# Episode Transcription Script
+
+This script transcribes video episodes with speaker diarization and infers speaker names using AI.
+
+## Features
+
+- ✅ Transcribes all `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm` files in `episodes/` folder
+- ✅ Speaker diarization (identifies who spoke when)
+- ✅ AI-powered speaker naming based on context
+- ✅ Smart merging of non-word utterances (sounds, modal particles)
+- ✅ Progress tracking - resume from where you left off
+- ✅ Output format: `[mm:ss](SpeakerName) line content`
+
+## Setup
+
+### 1. Install uv (if not already installed)
+
+```bash
+curl -LsSf https://astral.sh/uv/install.sh | sh
+```
+
+### 2. Set API Keys
+
+```bash
+# Required: AssemblyAI API key (free tier: 100 hours/month)
+export ASSEMBLYAI_API_KEY="your-assemblyai-key"
+
+# Required: OpenAI/Kimi API key
+export OPENAI_API_KEY="your-kimi-key"
+
+# Optional: If using Kimi (already set as default in script)
+export OPENAI_BASE_URL="https://api.moonshot.cn/v1"
+```
+
+Get your API keys:
+- AssemblyAI: https://www.assemblyai.com/ (free tier available)
+- Kimi: https://platform.moonshot.cn/ 
+
+## Usage
+
+### Run with uv (recommended)
+
+```bash
+# This will automatically install dependencies and run the script
+uv run transcribe_episodes.py
+```
+
+### Or sync dependencies first, then run
+
+```bash
+# Install dependencies (creates .venv automatically)
+uv sync
+
+# Run the script
+uv run python transcribe_episodes.py
+```
+
+### Check progress
+
+```bash
+uv run transcribe_episodes.py status
+```
+
+### Reset and re-process
+
+```bash
+# Reset all (will re-process everything)
+uv run transcribe_episodes.py reset
+
+# Reset specific file only
+uv run transcribe_episodes.py reset S02E02.mp4
+```
+
+## Output
+
+Transcripts are saved to `transcripts/` folder as `.txt` files:
+
+```
+transcripts/
+├── S02E01.txt
+└── S02E02.txt
+```
+
+Example content:
+```
+[00:12](Malabar) Hello everyone, welcome back!
+[00:15](Sun) Nice to see you all again.
+[00:18](Jupiter) Yeah, let's get started.
+```
+
+## Progress Tracking
+
+The script creates `.transcription_progress.json` to track which files are:
+- `completed` - Successfully processed
+- `error` - Failed (check error message)
+- `transcribing` - In progress (transcription)
+- `naming` - In progress (speaker naming)
+
+If interrupted, simply re-run the script - it will skip completed files.
+
+## How Speaker Naming Works
+
+1. Transcribe with AssemblyAI to get speaker labels (A, B, C...)
+2. Sample utterances from each speaker
+3. Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole
+4. LLM infers which speaker is which character based on speaking style and content
+5. Apply inferred names to output
+
+## Troubleshooting
+
+**AssemblyAI upload fails:**
+- Check your API key
+- Check internet connection
+- Video files might be too large for free tier
+
+**Speaker naming is wrong:**
+- The LLM makes educated guesses based on context
+- You can manually edit the output files if needed
+- Consider providing more context about each character's personality
+
+**Progress lost:**
+- Don't delete `.transcription_progress.json`
+- It tracks which files are done to avoid re-processing
+
+## Development
+
+```bash
+# Add new dependencies
+uv add <package-name>
+
+# Add dev dependencies
+uv add --dev <package-name>
+
+# Update lock file
+uv lock
+```