Generate Subtitle (STT)¶
Transcribe audio or video into timed subtitles. Picks up speech and emits SRT / VTT / ASS / SSA — with optional translation in the same pass.
What you need¶
- FFmpeg on
PATHfor audio/video decoding — see FFmpeg setup. - A transcription backend, one of:
- faster-whisper — local, offline, free (default; no setup needed)
- Google Cloud Speech-to-Text — cloud, paid, more accurate on noisy audio. See Google Cloud setup.
- Soniox — cloud, paid, real-time and speaker diarization. See Soniox setup.
Walkthrough¶
- Click Generate Subtitle in the sidebar.
- Drop one or more audio / video files (
.mp3,.wav,.m4a,.flac,.ogg,.aac,.wma,.mp4,.webm,.mkv,.avi,.mov,.wmv). - Pick the Source language (the language spoken in the audio) — leave
on
Auto-detectfor Whisper to figure it out. - Pick a Target language — pick
No translationfor a plain transcript, or any of the 45 supported languages to get the transcript translated in the same pass. - Pick the Output format (SRT / VTT / ASS / SSA).
- Click Generate (or
Ctrl+Enter). - Watch the queue. Open the row when done.
Format choice¶
| Format | Best for |
|---|---|
| SRT | Universal — almost every player supports it |
| VTT | HTML5 <video> <track> elements |
| ASS / SSA | Karaoke, styled subtitles, fansub workflows |
The four formats round-trip through the same parser, so you can switch output format on a re-translate without losing timing.
Whisper model size¶
Switch in Settings → Subtitle:
| Model | Size | Speed | Accuracy |
|---|---|---|---|
tiny |
~75 MB | very fast | low |
base (default) |
~150 MB | fast | decent |
small |
~500 MB | medium | good |
medium |
~1.5 GB | slow | high |
large |
~3 GB | very slow | best |
Models download on first use and cache locally. On a slow connection the first run feels long; subsequent runs are fast.
STT method comparison¶
| Backend | Cost | Online? | Speaker diarization | Languages |
|---|---|---|---|---|
| Whisper (local) | Free | No | No | 99 |
| Google Cloud STT | Paid | Yes | Yes (latest_long model) |
125+ |
| Soniox | Paid | Yes | Yes (per-token speaker labels) | 60+ |
Switch in Settings → Subtitle → STT method.
Tips¶
- Stop button — interrupt an in-flight batch. Files queued behind the active one stay queued; you can resume later.
- Re-generate — right-click a Done entry to re-run with a different format / language / STT method.
- Long-form audio — Whisper handles hours of audio fine; budget ~1
minute of processing per minute of audio on a CPU
basemodel.
Shortcuts¶
| Shortcut | Action |
|---|---|
Ctrl+Enter |
Generate |
Ctrl+O |
Browse |
Ctrl+F |
Focus history search |