137 lines
3.2 KiB
Markdown
137 lines
3.2 KiB
Markdown
# Episode Transcription Script
|
|
|
|
This script transcribes video episodes with speaker diarization and infers speaker names using AI.
|
|
|
|
## Features
|
|
|
|
- ✅ Transcribes all `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm` files in `episodes/` folder
|
|
- ✅ Speaker diarization (identifies who spoke when)
|
|
- ✅ AI-powered speaker naming based on context
|
|
- ✅ Smart merging of non-word utterances (sounds, modal particles)
|
|
- ✅ Progress tracking - resume from where you left off
|
|
- ✅ Output format: `[mm:ss](SpeakerName) line content`
|
|
|
|
## Setup
|
|
|
|
### 1. Install uv (if not already installed)
|
|
|
|
```bash
|
|
curl -LsSf https://astral.sh/uv/install.sh | sh
|
|
```
|
|
|
|
### 2. Set API Keys
|
|
|
|
```bash
|
|
# Required: AssemblyAI API key (free tier: 100 hours/month)
|
|
export ASSEMBLYAI_API_KEY="your-assemblyai-key"
|
|
|
|
# Required: OpenAI/Kimi API key
|
|
export OPENAI_API_KEY="your-kimi-key"
|
|
|
|
# Optional: If using Kimi (already set as default in script)
|
|
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"
|
|
```
|
|
|
|
Get your API keys:
|
|
- AssemblyAI: https://www.assemblyai.com/ (free tier available)
|
|
- Kimi: https://platform.moonshot.cn/
|
|
|
|
## Usage
|
|
|
|
### Run with uv (recommended)
|
|
|
|
```bash
|
|
# This will automatically install dependencies and run the script
|
|
uv run transcribe_episodes.py
|
|
```
|
|
|
|
### Or sync dependencies first, then run
|
|
|
|
```bash
|
|
# Install dependencies (creates .venv automatically)
|
|
uv sync
|
|
|
|
# Run the script
|
|
uv run python transcribe_episodes.py
|
|
```
|
|
|
|
### Check progress
|
|
|
|
```bash
|
|
uv run transcribe_episodes.py status
|
|
```
|
|
|
|
### Reset and re-process
|
|
|
|
```bash
|
|
# Reset all (will re-process everything)
|
|
uv run transcribe_episodes.py reset
|
|
|
|
# Reset specific file only
|
|
uv run transcribe_episodes.py reset S02E02.mp4
|
|
```
|
|
|
|
## Output
|
|
|
|
Transcripts are saved to `transcripts/` folder as `.txt` files:
|
|
|
|
```
|
|
transcripts/
|
|
├── S02E01.txt
|
|
└── S02E02.txt
|
|
```
|
|
|
|
Example content:
|
|
```
|
|
[00:12](Malabar) Hello everyone, welcome back!
|
|
[00:15](Sun) Nice to see you all again.
|
|
[00:18](Jupiter) Yeah, let's get started.
|
|
```
|
|
|
|
## Progress Tracking
|
|
|
|
The script creates `.transcription_progress.json` to track which files are:
|
|
- `completed` - Successfully processed
|
|
- `error` - Failed (check error message)
|
|
- `transcribing` - In progress (transcription)
|
|
- `naming` - In progress (speaker naming)
|
|
|
|
If interrupted, simply re-run the script - it will skip completed files.
|
|
|
|
## How Speaker Naming Works
|
|
|
|
1. Transcribe with AssemblyAI to get speaker labels (A, B, C...)
|
|
2. Sample utterances from each speaker
|
|
3. Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole
|
|
4. LLM infers which speaker is which character based on speaking style and content
|
|
5. Apply inferred names to output
|
|
|
|
## Troubleshooting
|
|
|
|
**AssemblyAI upload fails:**
|
|
- Check your API key
|
|
- Check internet connection
|
|
- Video files might be too large for free tier
|
|
|
|
**Speaker naming is wrong:**
|
|
- The LLM makes educated guesses based on context
|
|
- You can manually edit the output files if needed
|
|
- Consider providing more context about each character's personality
|
|
|
|
**Progress lost:**
|
|
- Don't delete `.transcription_progress.json`
|
|
- It tracks which files are done to avoid re-processing
|
|
|
|
## Development
|
|
|
|
```bash
|
|
# Add new dependencies
|
|
uv add <package-name>
|
|
|
|
# Add dev dependencies
|
|
uv add --dev <package-name>
|
|
|
|
# Update lock file
|
|
uv lock
|
|
```
|