a
This commit is contained in:
136
README_TRANSCRIBE.md
Normal file
136
README_TRANSCRIBE.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Episode Transcription Script
|
||||
|
||||
This script transcribes video episodes with speaker diarization and infers speaker names using AI.
|
||||
|
||||
## Features
|
||||
|
||||
- ✅ Transcribes all `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm` files in `episodes/` folder
|
||||
- ✅ Speaker diarization (identifies who spoke when)
|
||||
- ✅ AI-powered speaker naming based on context
|
||||
- ✅ Smart merging of non-word utterances (sounds, modal particles)
|
||||
- ✅ Progress tracking - resume from where you left off
|
||||
- ✅ Output format: `[mm:ss](SpeakerName) line content`
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Install uv (if not already installed)
|
||||
|
||||
```bash
|
||||
curl -LsSf https://astral.sh/uv/install.sh | sh
|
||||
```
|
||||
|
||||
### 2. Set API Keys
|
||||
|
||||
```bash
|
||||
# Required: AssemblyAI API key (free tier: 100 hours/month)
|
||||
export ASSEMBLYAI_API_KEY="your-assemblyai-key"
|
||||
|
||||
# Required: OpenAI/Kimi API key
|
||||
export OPENAI_API_KEY="your-kimi-key"
|
||||
|
||||
# Optional: If using Kimi (already set as default in script)
|
||||
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"
|
||||
```
|
||||
|
||||
Get your API keys:
|
||||
- AssemblyAI: https://www.assemblyai.com/ (free tier available)
|
||||
- Kimi: https://platform.moonshot.cn/
|
||||
|
||||
## Usage
|
||||
|
||||
### Run with uv (recommended)
|
||||
|
||||
```bash
|
||||
# This will automatically install dependencies and run the script
|
||||
uv run transcribe_episodes.py
|
||||
```
|
||||
|
||||
### Or sync dependencies first, then run
|
||||
|
||||
```bash
|
||||
# Install dependencies (creates .venv automatically)
|
||||
uv sync
|
||||
|
||||
# Run the script
|
||||
uv run python transcribe_episodes.py
|
||||
```
|
||||
|
||||
### Check progress
|
||||
|
||||
```bash
|
||||
uv run transcribe_episodes.py status
|
||||
```
|
||||
|
||||
### Reset and re-process
|
||||
|
||||
```bash
|
||||
# Reset all (will re-process everything)
|
||||
uv run transcribe_episodes.py reset
|
||||
|
||||
# Reset specific file only
|
||||
uv run transcribe_episodes.py reset S02E02.mp4
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
Transcripts are saved to `transcripts/` folder as `.txt` files:
|
||||
|
||||
```
|
||||
transcripts/
|
||||
├── S02E01.txt
|
||||
└── S02E02.txt
|
||||
```
|
||||
|
||||
Example content:
|
||||
```
|
||||
[00:12](Malabar) Hello everyone, welcome back!
|
||||
[00:15](Sun) Nice to see you all again.
|
||||
[00:18](Jupiter) Yeah, let's get started.
|
||||
```
|
||||
|
||||
## Progress Tracking
|
||||
|
||||
The script creates `.transcription_progress.json` to track which files are:
|
||||
- `completed` - Successfully processed
|
||||
- `error` - Failed (check error message)
|
||||
- `transcribing` - In progress (transcription)
|
||||
- `naming` - In progress (speaker naming)
|
||||
|
||||
If interrupted, simply re-run the script - it will skip completed files.
|
||||
|
||||
## How Speaker Naming Works
|
||||
|
||||
1. Transcribe with AssemblyAI to get speaker labels (A, B, C...)
|
||||
2. Sample utterances from each speaker
|
||||
3. Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole
|
||||
4. LLM infers which speaker is which character based on speaking style and content
|
||||
5. Apply inferred names to output
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**AssemblyAI upload fails:**
|
||||
- Check your API key
|
||||
- Check internet connection
|
||||
- Video files might be too large for free tier
|
||||
|
||||
**Speaker naming is wrong:**
|
||||
- The LLM makes educated guesses based on context
|
||||
- You can manually edit the output files if needed
|
||||
- Consider providing more context about each character's personality
|
||||
|
||||
**Progress lost:**
|
||||
- Don't delete `.transcription_progress.json`
|
||||
- It tracks which files are done to avoid re-processing
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Add new dependencies
|
||||
uv add <package-name>
|
||||
|
||||
# Add dev dependencies
|
||||
uv add --dev <package-name>
|
||||
|
||||
# Update lock file
|
||||
uv lock
|
||||
```
|
||||
Reference in New Issue
Block a user