font_utils¶
font_utils
¶
Unified font family handling across all file types (image, office, PDF).
Implements a hybrid font selection strategy: 1. Determine the generic family (serif / sans-serif / monospace) from the source font name or PDF font flags. 2. Select a concrete font that supports the target language and belongs to the same generic family. 3. Fall back to the generic CSS family name when no concrete match is found.
This module is PySide6-free — it works headlessly for CLI / MCP / REST usage.
detect_script
¶
Detects the dominant non-Latin script family from text.
Scans characters until a non-Latin script is identified. Returns
"latin" for ASCII / Latin-only text (including extended Latin
for Vietnamese, Turkish, etc.).
| PARAMETER | DESCRIPTION |
|---|---|
text
|
The text to analyse.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
A script family identifier (e.g. |
Source code in src/utils/font_utils.py
classify_generic_family
¶
Determines the generic CSS family from a source font.
Uses two inputs (either or both may be provided):
- font_name: The font's family name (e.g. "Times New Roman").
- font_flags: PyMuPDF font flags (bit 3 = mono, bit 2 = serif).
When both are provided, font_name takes precedence since it's
more specific than PyMuPDF's coarse 2-bit classification.
| PARAMETER | DESCRIPTION |
|---|---|
font_name
|
The source font family name.
TYPE:
|
font_flags
|
PyMuPDF span font flags.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
One of |
Source code in src/utils/font_utils.py
_resolve_font_key
¶
Resolve the target language to a _FONT_DB key.
Tries exact match, then _LANG_TO_SCRIPT mapping, then substring
match against _FONT_DB keys, and finally "default".
Source code in src/utils/font_utils.py
get_font_for_language
¶
Selects the best concrete font for a target language and generic family.
Returns the first candidate from _FONT_DB for the resolved
language/script key. Falls back to the generic CSS family name
when no candidates exist.
| PARAMETER | DESCRIPTION |
|---|---|
target_lang
|
Target language name (e.g. "Japanese", "Vietnamese").
TYPE:
|
generic_family
|
One of
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
A concrete font family name or a generic CSS family name. |