Files
malabar/README_TRANSCRIBE.md
2026-03-03 17:20:29 +08:00

137 lines
3.2 KiB
Markdown

# Episode Transcription Script
This script transcribes video episodes with speaker diarization and infers speaker names using AI.
## Features
- ✅ Transcribes all `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm` files in `episodes/` folder
- ✅ Speaker diarization (identifies who spoke when)
- ✅ AI-powered speaker naming based on context
- ✅ Smart merging of non-word utterances (sounds, modal particles)
- ✅ Progress tracking - resume from where you left off
- ✅ Output format: `[mm:ss](SpeakerName) line content`
## Setup
### 1. Install uv (if not already installed)
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
### 2. Set API Keys
```bash
# Required: AssemblyAI API key (free tier: 100 hours/month)
export ASSEMBLYAI_API_KEY="your-assemblyai-key"
# Required: OpenAI/Kimi API key
export OPENAI_API_KEY="your-kimi-key"
# Optional: If using Kimi (already set as default in script)
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"
```
Get your API keys:
- AssemblyAI: https://www.assemblyai.com/ (free tier available)
- Kimi: https://platform.moonshot.cn/
## Usage
### Run with uv (recommended)
```bash
# This will automatically install dependencies and run the script
uv run transcribe_episodes.py
```
### Or sync dependencies first, then run
```bash
# Install dependencies (creates .venv automatically)
uv sync
# Run the script
uv run python transcribe_episodes.py
```
### Check progress
```bash
uv run transcribe_episodes.py status
```
### Reset and re-process
```bash
# Reset all (will re-process everything)
uv run transcribe_episodes.py reset
# Reset specific file only
uv run transcribe_episodes.py reset S02E02.mp4
```
## Output
Transcripts are saved to `transcripts/` folder as `.txt` files:
```
transcripts/
├── S02E01.txt
└── S02E02.txt
```
Example content:
```
[00:12](Malabar) Hello everyone, welcome back!
[00:15](Sun) Nice to see you all again.
[00:18](Jupiter) Yeah, let's get started.
```
## Progress Tracking
The script creates `.transcription_progress.json` to track which files are:
- `completed` - Successfully processed
- `error` - Failed (check error message)
- `transcribing` - In progress (transcription)
- `naming` - In progress (speaker naming)
If interrupted, simply re-run the script - it will skip completed files.
## How Speaker Naming Works
1. Transcribe with AssemblyAI to get speaker labels (A, B, C...)
2. Sample utterances from each speaker
3. Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole
4. LLM infers which speaker is which character based on speaking style and content
5. Apply inferred names to output
## Troubleshooting
**AssemblyAI upload fails:**
- Check your API key
- Check internet connection
- Video files might be too large for free tier
**Speaker naming is wrong:**
- The LLM makes educated guesses based on context
- You can manually edit the output files if needed
- Consider providing more context about each character's personality
**Progress lost:**
- Don't delete `.transcription_progress.json`
- It tracks which files are done to avoid re-processing
## Development
```bash
# Add new dependencies
uv add <package-name>
# Add dev dependencies
uv add --dev <package-name>
# Update lock file
uv lock
```