woncomp/malabar

Fork 0

Files

Joshua Zhang 0f8513c7ad a

2026-03-03 17:20:29 +08:00

3.2 KiB

Raw Permalink Blame History

Episode Transcription Script

This script transcribes video episodes with speaker diarization and infers speaker names using AI.

Features

✅ Transcribes all .mp4, .mkv, .avi, .mov, .webm files in episodes/ folder
✅ Speaker diarization (identifies who spoke when)
✅ AI-powered speaker naming based on context
✅ Smart merging of non-word utterances (sounds, modal particles)
✅ Progress tracking - resume from where you left off
✅ Output format: [mm:ss](SpeakerName) line content

Setup

1. Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Set API Keys

# Required: AssemblyAI API key (free tier: 100 hours/month)
export ASSEMBLYAI_API_KEY="your-assemblyai-key"

# Required: OpenAI/Kimi API key
export OPENAI_API_KEY="your-kimi-key"

# Optional: If using Kimi (already set as default in script)
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"

Get your API keys:

AssemblyAI: https://www.assemblyai.com/ (free tier available)
Kimi: https://platform.moonshot.cn/

Usage

Run with uv (recommended)

# This will automatically install dependencies and run the script
uv run transcribe_episodes.py

Or sync dependencies first, then run

# Install dependencies (creates .venv automatically)
uv sync

# Run the script
uv run python transcribe_episodes.py

Check progress

uv run transcribe_episodes.py status

Reset and re-process

# Reset all (will re-process everything)
uv run transcribe_episodes.py reset

# Reset specific file only
uv run transcribe_episodes.py reset S02E02.mp4

Output

Transcripts are saved to transcripts/ folder as .txt files:

transcripts/
├── S02E01.txt
└── S02E02.txt

Example content:

[00:12](Malabar) Hello everyone, welcome back!
[00:15](Sun) Nice to see you all again.
[00:18](Jupiter) Yeah, let's get started.

Progress Tracking

The script creates .transcription_progress.json to track which files are:

completed - Successfully processed
error - Failed (check error message)
transcribing - In progress (transcription)
naming - In progress (speaker naming)

If interrupted, simply re-run the script - it will skip completed files.

How Speaker Naming Works

Transcribe with AssemblyAI to get speaker labels (A, B, C...)
Sample utterances from each speaker
Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole
LLM infers which speaker is which character based on speaking style and content
Apply inferred names to output

Troubleshooting

AssemblyAI upload fails:

Check your API key
Check internet connection
Video files might be too large for free tier

Speaker naming is wrong:

The LLM makes educated guesses based on context
You can manually edit the output files if needed
Consider providing more context about each character's personality

Progress lost:

Don't delete .transcription_progress.json
It tracks which files are done to avoid re-processing

Development

# Add new dependencies
uv add <package-name>

# Add dev dependencies
uv add --dev <package-name>

# Update lock file
uv lock

3.2 KiB Raw Permalink Blame History