live_engine¶
live_engine
¶
Live audio capture and streaming transcription engine.
Captures microphone audio in real-time, transcribes using faster-whisper, and emits recognized sentences for translation.
LiveTranscriber
¶
LiveTranscriber(
on_sentence,
on_partial=None,
on_status=None,
on_stopped=None,
model_size="tiny",
language="",
device=None,
audio_source="microphone",
record_to=None,
)
Captures audio and transcribes in real-time.
Supports three audio source modes:
- microphone: default input device (mic)
- system: monitor source that captures desktop/system audio
- both: mixes microphone and system audio together
Usage
transcriber = LiveTranscriber( on_sentence=lambda text: print(f"[final] {text}"), model_size="tiny", language="Vietnamese", audio_source="microphone", ) transcriber.start()
... later ...¶
transcriber.stop()
Initializes the live transcriber.
| PARAMETER | DESCRIPTION |
|---|---|
on_sentence
|
Called with (text, start_sec, end_sec) for each sentence.
TYPE:
|
on_partial
|
Called with partial (in-progress) text.
TYPE:
|
on_status
|
Called with status messages.
TYPE:
|
on_stopped
|
Called when the processing loop exits (error or normal).
TYPE:
|
model_size
|
Whisper model size.
TYPE:
|
language
|
Source language label. Empty for auto-detect.
TYPE:
|
device
|
Audio input device index. None for default mic.
TYPE:
|
audio_source
|
One of "microphone", "system", or "both".
TYPE:
|
record_to
|
Optional
TYPE:
|
Source code in src/core/live_engine.py
start
¶
Starts audio capture and transcription.
Source code in src/core/live_engine.py
stop
¶
Stops audio capture and transcription.
Source code in src/core/live_engine.py
_open_recording
¶
Opens the WAV recording file for the current session.
No-op when record_to wasn't set on the constructor. Errors
are logged but never propagate — recording is a best-effort
side channel; failure shouldn't abort the live session.
Source code in src/core/live_engine.py
_record_block
¶
Writes a captured audio block to the WAV file when recording.
Block arrives as float32 in [-1.0, 1.0]; the WAV format uses s16le, so we scale and clip to int16 before writing. Best- effort: a write error logs and disables recording for the rest of the session rather than crashing the live loop.
Source code in src/core/live_engine.py
_close_recording
¶
Closes the WAV writer at session end.
_emit_status
¶
_audio_callback
¶
Called by sounddevice for each audio block.
Source code in src/core/live_engine.py
_resolve_devices
¶
Returns the mic device index and validates system audio if needed.
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If system audio is required but not available. |
Source code in src/core/live_engine.py
_start_system_audio
¶
Spawns the platform-appropriate system-audio capture subprocess.
Dispatches to :meth:_start_system_audio_linux (parec),
:meth:_start_system_audio_macos (ffmpeg + avfoundation), or
:meth:_start_system_audio_windows (ffmpeg + dshow) based on
platform.system(). All three populate self._sys_audio_proc
and start a reader thread on self._sys_audio_thread that
feeds 16-kHz mono float32 blocks into target_queue.
Raises ValueError("live.error_no_system_audio") if the
platform isn't supported or the prerequisites (PulseAudio
monitor / virtual loopback device / ffmpeg) are missing.
Source code in src/core/live_engine.py
_spawn_pcm_reader
¶
Spawns argv, expects raw 16-kHz s16le mono PCM on stdout.
Shared back-end for the three per-platform capture methods.
Stores the process on self._sys_audio_proc and starts a
reader thread on self._sys_audio_thread that converts
bytes → float32 → numpy blocks → target_queue. The reader
exits when either _is_running flips to False or the
subprocess closes its stdout.
Source code in src/core/live_engine.py
_start_system_audio_linux
¶
Captures system audio on Linux via parec.
Reads raw PCM (s16le, mono, 16 kHz) from the default sink's
monitor source. Requires PulseAudio or PipeWire-pulse to be
running so a default sink (and its .monitor source) exists.
Source code in src/core/live_engine.py
_start_system_audio_macos
¶
Captures system audio on macOS via ffmpeg + avfoundation.
CoreAudio doesn't expose the system mix natively, so the user
must have a virtual loopback device installed (BlackHole,
Loopback, Soundflower, iShowU). We auto-detect the device
index from ffmpeg -list_devices so the user doesn't have
to configure anything beyond installing the loopback driver.
-fflags nobuffer + -flags low_delay keep added latency
below the 5-second STT chunk size; -acodec pcm_s16le plus
-f s16le - writes raw PCM straight to stdout for the
shared reader thread.
Source code in src/core/live_engine.py
_start_system_audio_windows
¶
Captures system audio on Windows.
Tries the soundcard package first — it talks to WASAPI's
native loopback flag directly, so the user doesn't need to
install any extra software on a modern Windows machine. Falls
back to ffmpeg -f dshow against a virtual-loopback
DirectShow device (virtual-audio-capturer from Screen
Capture Recorder, VB-Audio Virtual Cable, or legacy
Stereo Mix) when soundcard isn't importable or fails to
initialise — this preserves compatibility with users who
already have a virtual cable installed from a prior version.
Source code in src/core/live_engine.py
_start_system_audio_windows_soundcard
¶
Captures Windows system audio via the soundcard package.
Opens a loopback recorder against the default speaker — this is WASAPI's native loopback mode, available on every Windows version since Vista, no virtual cable required. The recorder yields float32 numpy frames at 16 kHz mono (the same shape the rest of the pipeline expects), so we push them straight into target_queue.
Reader thread polls record(numframes=_BLOCK_SIZE) until
_is_running flips to False. No subprocess to manage —
_stop_system_audio only needs to wait for the thread to
notice the flag and exit naturally.
Raises ImportError if soundcard isn't installed (the
outer dispatcher catches this and falls back to ffmpeg+dshow).
Source code in src/core/live_engine.py
_stop_system_audio
¶
Terminates the system-audio capture subprocess if running.
Platform-agnostic: works for both the parec (Linux) and ffmpeg
(macOS / Windows) processes, since :meth:_start_system_audio
stashes them all on the same _sys_audio_proc /
_sys_audio_thread attributes.
Hardened against subprocesses that ignore SIGTERM — if
proc.wait(timeout=3) raises subprocess.TimeoutExpired
we escalate to proc.kill() (SIGKILL) and wait a final
second. Without this, a hung child process would (a) leave
self._sys_audio_proc pointing at a defunct Popen so the
next _stop_system_audio call would re-enter the same
terminate→wait→raise loop, and (b) skip the reader-thread
join below, leaking the thread reference for the rest of
the session.
Source code in src/core/live_engine.py
_open_streams
¶
Opens audio source(s) based on audio_source setting.
Source code in src/core/live_engine.py
_read_block
¶
Reads the next audio block, mixing if in 'both' mode.
Also mirrors the block to the recording WAV file when
record_to was set on the constructor. Done here (after
mixing, before STT) so the recording matches what Whisper
actually transcribed — same single-source or mixed waveform
the user heard.
| RETURNS | DESCRIPTION |
|---|---|
ndarray | None
|
Audio block as numpy array, or None on timeout. |
Source code in src/core/live_engine.py
_read_block_raw
¶
Returns the next captured block (no recording side effect).
Source code in src/core/live_engine.py
_process_loop
¶
Main processing loop: validate audio, load model, transcribe.
Source code in src/core/live_engine.py
1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 | |
_transcribe_buffer
¶
Transcribes accumulated audio blocks.
| PARAMETER | DESCRIPTION |
|---|---|
model
|
The Whisper model instance.
TYPE:
|
audio_blocks
|
List of audio numpy arrays to transcribe.
TYPE:
|
lang_code
|
Language code for Whisper, or None for auto.
TYPE:
|
start_sec
|
Start time in seconds since capture began.
TYPE:
|
end_sec
|
End time in seconds since capture began.
TYPE:
|
Source code in src/core/live_engine.py
is_whisper_model_cached
¶
Returns True when the faster-whisper model files are already on disk.
Used by the Live page's showEvent preload to avoid silently
triggering a multi-hundred-MB download for users who navigate to
the page without intending to start a session. Returns False
for unknown sizes and for environments without huggingface_hub.
Source code in src/core/live_engine.py
preload_whisper_model
¶
Loads the named Whisper model into the module cache, if not present.
Idempotent: returns immediately when the cache already holds the requested size. Safe to call from a background thread; the GIL plus the simple two-line cache write make a partial concurrent update harmless (the second writer just rebinds to its own model instance, which becomes garbage immediately). Errors are swallowed — a failed preload is a UX optimisation miss, not a bug.
Source code in src/core/live_engine.py
_put_drop_oldest
¶
Non-blocking put that drops the oldest item when the queue is full.
Producers (sounddevice callbacks, parec reader threads) must never block on the Python-level queue: a stalled producer would either drop device samples at the OS level or stall the callback. When a slow consumer (slow whisper model) can't keep up, drop the oldest item so the queue keeps representing the latest audio available.
Source code in src/core/live_engine.py
next_block
¶
Returns the oldest pending item from q, or None on timeout.
Simple FIFO consumption. Prior versions of this helper drained to
the newest block to cap real-time drift between mic and system
streams, but that discarded 2–3 seconds of audio per whisper
transcription cycle (the queues grow ~5 blocks during a 1.3 s
transcribe() call; the drop left whisper with non-contiguous
audio and triggered the "Compression ratio > 2.4 → reject"
failure mode). FIFO keeps audio contiguous; queues grow during
transcription and drain again on the next idle iteration — whisper
is faster than capture on base model, so the queue size oscillates
but doesn't grow unbounded.
Shared between LiveTranscriber._read_block (whisper internal
mixer) and the Soniox / Gemini Live "both"-mode mixer in the UI
layer so the two paths have identical consumption semantics.
Source code in src/core/live_engine.py
_get_install_hint
¶
Returns a distro-specific install command hint, or empty string.
| PARAMETER | DESCRIPTION |
|---|---|
packages
|
Mapping of package-manager binary to full install command.
TYPE:
|
Source code in src/core/live_engine.py
_get_portaudio_install_hint
¶
_get_pulseaudio_install_hint
¶
invalidate_audio_caches
¶
Forces the next probe call to re-shell-out instead of using cached state.
Called from the Live page when the user clicks Start or changes the audio source combo — both are points where the user has intent + we want a fresh diagnosis. Cheap; safe to call redundantly.
Source code in src/core/live_engine.py
check_audio_available
¶
Pre-validates that audio capture is possible.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Empty string on success, or an i18n error key string. |
str
|
When PortAudio is missing on Linux, the key includes a |
str
|
|
Result is cached in :data:_audio_available_cache so repeated
calls (showEvent re-probe) skip the sd.query_devices()
enumeration after the first call. Invalidate explicitly via
:func:invalidate_audio_caches when the user signals a
re-probe (Start click / source combo change).
Source code in src/core/live_engine.py
list_input_devices
¶
Returns a list of available audio input devices.
| RETURNS | DESCRIPTION |
|---|---|
list[tuple[int, str]]
|
List of (device_index, device_name) tuples. |
Source code in src/core/live_engine.py
_get_default_monitor_source
¶
Returns the PulseAudio monitor source name for the default output sink.
Queries pactl for the default sink and appends .monitor.
Returns None when pactl is absent or the query fails.
Timeout is intentionally tight (1 s). pactl get-default-sink
normally responds in <10 ms; a multi-second wait only ever
happens when PulseAudio / PipeWire is restarting or its dbus
socket is wedged. This function runs on the UI thread via
check_system_audio_available's showEvent path, so an
over-generous timeout there used to freeze the page for 5 s
and trigger window-manager "application not responding" hints.
1 s keeps the worst case well below the WM threshold while
still leaving headroom for typical recovery.
Source code in src/core/live_engine.py
_get_macos_loopback_device_index
¶
Returns the avfoundation audio device index of a virtual loopback.
Runs ffmpeg -f avfoundation -list_devices true -i "" and parses
the audio-device list out of stderr (avfoundation prints to stderr,
not stdout). Matches device names against
_MACOS_LOOPBACK_KEYWORDS so the user's specific virtual device
(BlackHole / Loopback / Soundflower / iShowU) is auto-detected.
Returns the zero-based audio index suitable for
ffmpeg -f avfoundation -i ":<index>". Returns None when ffmpeg
is missing or no virtual loopback is installed.
Source code in src/core/live_engine.py
_get_windows_loopback_device_name
¶
Returns the dshow audio device name of an installed virtual loopback.
Runs ffmpeg -f dshow -list_devices true -i dummy and parses the
device list from stderr. Returns the first name in
_WINDOWS_LOOPBACK_DEVICES that's actually present. None when
ffmpeg is missing or no compatible device is installed.
Source code in src/core/live_engine.py
_check_windows_soundcard_loopback
¶
Returns True if the soundcard package can grab WASAPI loopback.
soundcard is the preferred Windows backend: it calls WASAPI's
native loopback flag directly — no extra software install — so
most Windows machines are good to go without virtual-audio-capturer
/ VB-Audio. We only need to confirm the import works AND a
default speaker exists; the actual recorder is opened later in
:meth:LiveTranscriber._start_system_audio_windows_soundcard.
Source code in src/core/live_engine.py
check_system_audio_available
¶
Returns True if system audio capture is possible on this OS.
Dispatches to a per-platform check:
- Linux:
parecis on PATH AND a default PulseAudio sink exists. - macOS:
ffmpegis on PATH AND a virtual loopback device (BlackHole / Loopback / Soundflower) is installed. - Windows: the
soundcardpackage can reach WASAPI loopback (no extra software needed) ORffmpegplus a known DirectShow loopback device (virtual-audio-capturer / VB-Audio / Stereo Mix) is available as a fallback.
Result is cached in :data:_system_audio_available_cache to
avoid re-shelling-out to pactl / ffmpeg on every
showEvent and audio-source combo refresh. Invalidate via
:func:invalidate_audio_caches when the user explicitly
signals a re-probe (Start click / source combo change).