font_utils¶
font_utils
¶
Unified font family handling across all file types (image, office, PDF).
Implements a hybrid font selection strategy: 1. Determine the generic family (serif / sans-serif / monospace) from the source font name or PDF font flags. 2. Select a concrete font that supports the target language and belongs to the same generic family. 3. Fall back to the generic CSS family name when no concrete match is found.
This module is PySide6-free — it works headlessly for CLI / MCP / REST usage.
detect_script
¶
Detects the dominant non-Latin script family from text.
Scans characters until a non-Latin script is identified. Returns
"latin" for ASCII / Latin-only text (including extended Latin
for Vietnamese, Turkish, etc.).
| 引数 | デスクリプション |
|---|---|
text
|
The text to analyse.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
A script family identifier (e.g. |
ソースコード位置: src/utils/font_utils.py
classify_generic_family
¶
Determines the generic CSS family from a source font.
Uses two inputs (either or both may be provided):
- font_name: The font's family name (e.g. "Times New Roman").
- font_flags: PyMuPDF font flags (bit 3 = mono, bit 2 = serif).
When both are provided, font_name takes precedence since it's
more specific than PyMuPDF's coarse 2-bit classification.
| 引数 | デスクリプション |
|---|---|
font_name
|
The source font family name.
タイプ:
|
font_flags
|
PyMuPDF span font flags.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
One of |
ソースコード位置: src/utils/font_utils.py
_resolve_font_key
¶
Resolve the target language to a _FONT_DB key.
Tries exact match, then _LANG_TO_SCRIPT mapping, then substring
match against _FONT_DB keys, and finally "default".
ソースコード位置: src/utils/font_utils.py
get_font_for_language
¶
Selects the best concrete font for a target language and generic family.
Returns the first candidate from _FONT_DB for the resolved
language/script key. Falls back to the generic CSS family name
when no candidates exist.
| 引数 | デスクリプション |
|---|---|
target_lang
|
Target language name (e.g. "Japanese", "Vietnamese").
タイプ:
|
generic_family
|
One of
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
A concrete font family name or a generic CSS family name. |