Files
malabar/README_TRANSCRIBE.md
2026-03-03 17:20:29 +08:00

3.2 KiB

Episode Transcription Script

This script transcribes video episodes with speaker diarization and infers speaker names using AI.

Features

  • Transcribes all .mp4, .mkv, .avi, .mov, .webm files in episodes/ folder
  • Speaker diarization (identifies who spoke when)
  • AI-powered speaker naming based on context
  • Smart merging of non-word utterances (sounds, modal particles)
  • Progress tracking - resume from where you left off
  • Output format: [mm:ss](SpeakerName) line content

Setup

1. Install uv (if not already installed)

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Set API Keys

# Required: AssemblyAI API key (free tier: 100 hours/month)
export ASSEMBLYAI_API_KEY="your-assemblyai-key"

# Required: OpenAI/Kimi API key
export OPENAI_API_KEY="your-kimi-key"

# Optional: If using Kimi (already set as default in script)
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"

Get your API keys:

Usage

# This will automatically install dependencies and run the script
uv run transcribe_episodes.py

Or sync dependencies first, then run

# Install dependencies (creates .venv automatically)
uv sync

# Run the script
uv run python transcribe_episodes.py

Check progress

uv run transcribe_episodes.py status

Reset and re-process

# Reset all (will re-process everything)
uv run transcribe_episodes.py reset

# Reset specific file only
uv run transcribe_episodes.py reset S02E02.mp4

Output

Transcripts are saved to transcripts/ folder as .txt files:

transcripts/
├── S02E01.txt
└── S02E02.txt

Example content:

[00:12](Malabar) Hello everyone, welcome back!
[00:15](Sun) Nice to see you all again.
[00:18](Jupiter) Yeah, let's get started.

Progress Tracking

The script creates .transcription_progress.json to track which files are:

  • completed - Successfully processed
  • error - Failed (check error message)
  • transcribing - In progress (transcription)
  • naming - In progress (speaker naming)

If interrupted, simply re-run the script - it will skip completed files.

How Speaker Naming Works

  1. Transcribe with AssemblyAI to get speaker labels (A, B, C...)
  2. Sample utterances from each speaker
  3. Send to LLM (Kimi) with context about characters: Malabar, Sun, Jupiter, Kangarro, Mole
  4. LLM infers which speaker is which character based on speaking style and content
  5. Apply inferred names to output

Troubleshooting

AssemblyAI upload fails:

  • Check your API key
  • Check internet connection
  • Video files might be too large for free tier

Speaker naming is wrong:

  • The LLM makes educated guesses based on context
  • You can manually edit the output files if needed
  • Consider providing more context about each character's personality

Progress lost:

  • Don't delete .transcription_progress.json
  • It tracks which files are done to avoid re-processing

Development

# Add new dependencies
uv add <package-name>

# Add dev dependencies
uv add --dev <package-name>

# Update lock file
uv lock