subtitle_utils¶
subtitle_utils
¶
Subtitle file parsing and serialization utilities.
Supports SRT, VTT (WebVTT), ASS, and SSA formats. Each format has a
parse/serialize pair. The unified parse_subtitle / serialize_subtitle
dispatchers select the correct pair based on file extension.
SubtitleEntry
dataclass
¶
A single subtitle cue with timing metadata and translatable text.
| ATTRIBUTE | DESCRIPTION |
|---|---|
index |
Sequential position (0-based).
TYPE:
|
start |
Start timestamp as the original raw string.
TYPE:
|
end |
End timestamp as the original raw string.
TYPE:
|
text |
Translatable text (override tags stripped for ASS/SSA).
TYPE:
|
raw_text |
Original text before tag stripping (ASS/SSA only).
TYPE:
|
metadata |
Format-specific extra data (cue id, cue settings, etc.).
TYPE:
|
mirror_ass_alignment_for_rtl
¶
Mirrors ASS/SSA alignment codes left↔right for an RTL target.
Flips both the per-line override tags (\an1 ↔ \an3 etc.)
and the V4+ Style table's Alignment column. Centre alignments
(\an2/5/8, legacy \a2/6/10) are untouched.
The function is a string-level rewrite — it doesn't validate ASS
structure, so an unrelated Style: row outside [V4+ Styles]
won't be touched (the column count would be wrong) but a malformed
file won't crash either.
Source code in src/utils/subtitle_utils.py
is_subtitle_format
¶
parse_srt
¶
Parses an SRT file into subtitle entries.
| PARAMETER | DESCRIPTION |
|---|---|
content
|
Raw SRT file content.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[SubtitleEntry]
|
Tuple of (entries, None). The second element is always |
None
|
because SRT needs no extra data for serialization. |
Source code in src/utils/subtitle_utils.py
serialize_srt
¶
Reconstructs an SRT file from subtitle entries.
| PARAMETER | DESCRIPTION |
|---|---|
entries
|
Subtitle entries with (possibly translated) text.
TYPE:
|
_format_data
|
Unused — present for dispatcher signature consistency.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Complete SRT file content. |
Source code in src/utils/subtitle_utils.py
_is_vtt_header_block
¶
Returns True if text is a VTT header/meta block (WEBVTT, NOTE, STYLE).
parse_vtt
¶
Parses a WebVTT file into subtitle entries.
Preserves the WEBVTT header, NOTE comments, and STYLE blocks in header so they can be restored during serialization.
| PARAMETER | DESCRIPTION |
|---|---|
content
|
Raw VTT file content.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[SubtitleEntry]
|
Tuple of (entries, header). header includes everything |
str
|
before the first cue (WEBVTT line, NOTEs, STYLEs). |
Source code in src/utils/subtitle_utils.py
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 | |
serialize_vtt
¶
Reconstructs a WebVTT file from entries and the original header.
| PARAMETER | DESCRIPTION |
|---|---|
entries
|
Subtitle entries with (possibly translated) text.
TYPE:
|
header
|
Original WEBVTT header block (with NOTEs/STYLEs).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Complete VTT file content. |
Source code in src/utils/subtitle_utils.py
_strip_ass_tags
¶
Strips ASS/SSA override tags, preserving visible text.
Tags like {\b1}, {\i1}, {\pos(320,240)} are removed.
| PARAMETER | DESCRIPTION |
|---|---|
text
|
Raw ASS dialogue text.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Text with override tags removed. |
Source code in src/utils/subtitle_utils.py
_restore_ass_tags
¶
Restores leading ASS override tags from original onto translated.
Mid-text tags cannot be reliably repositioned after translation, so only contiguous leading tags are restored.
| PARAMETER | DESCRIPTION |
|---|---|
original
|
Original text with override tags.
TYPE:
|
translated
|
Translated text without tags.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Translated text prefixed with the original's leading tags. |
Source code in src/utils/subtitle_utils.py
parse_ass
¶
Parses an ASS/SSA file into subtitle entries.
Only Dialogue: lines in the [Events] section are treated as
translatable. All other content (sections, comments, styles) is
preserved verbatim in preserved_lines for later serialization.
| PARAMETER | DESCRIPTION |
|---|---|
content
|
Raw ASS/SSA file content.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[SubtitleEntry]
|
Tuple of (entries, preserved_lines). Dialogue text positions |
list[str]
|
in preserved_lines are replaced with |
tuple[list[SubtitleEntry], list[str]]
|
where N is the entry index. |
Source code in src/utils/subtitle_utils.py
390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 | |
serialize_ass
¶
Reconstructs an ASS/SSA file by injecting translated text.
Replaces __SUB_N__ placeholders in preserved_lines with the
translated text for each entry, restoring any leading override tags
from the original.
| PARAMETER | DESCRIPTION |
|---|---|
entries
|
Subtitle entries with translated text.
TYPE:
|
preserved_lines
|
Lines with placeholders from
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Complete ASS/SSA file content. |
Source code in src/utils/subtitle_utils.py
parse_subtitle
¶
Dispatches to the format-specific subtitle parser.
| PARAMETER | DESCRIPTION |
|---|---|
content
|
Raw file content.
TYPE:
|
suffix
|
Lowercase file extension (e.g.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[SubtitleEntry]
|
Tuple of (entries, format_data) where format_data is |
object
|
whatever the format-specific serializer needs. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the extension is not a supported subtitle format. |
Source code in src/utils/subtitle_utils.py
serialize_subtitle
¶
Dispatches to the format-specific subtitle serializer.
| PARAMETER | DESCRIPTION |
|---|---|
entries
|
Subtitle entries with translated text.
TYPE:
|
format_data
|
Format-specific data from
TYPE:
|
suffix
|
Lowercase file extension.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Complete file content. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the extension is not a supported subtitle format. |