office_processor¶
office_processor
¶
Office document processing for DOCX, XLSX, PPTX, ODT, ODS, ODP and legacy formats.
Uses a 3-tier backend system
- win32com (Windows + MS Office)
- LibreOffice UNO API (cross-platform)
- python-docx / openpyxl / python-pptx / odfpy (modern + ODF formats)
Legacy formats (.doc, .xls, .ppt) require backend 1 or 2.
_detect_backend
¶
Detects the best available backend for the given file extension.
Priority order depends on format family: - OOXML (.docx/.xlsx/.pptx): python_lib immediately (lightweight). - ODF (.odt/.ods/.odp): UNO → win32com → python_lib (odfpy). - Legacy (.doc/.xls/.ppt): win32com → UNO → error.
| 引数 | デスクリプション |
|---|---|
suffix
|
Lowercase file extension (e.g. ".docx", ".doc").
タイプ:
|
libreoffice_path
|
User-configured LibreOffice path; forwarded
to
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
One of the backend identifiers.
タイプ:
|
| 発生 | デスクリプション |
|---|---|
ValueError
|
If no backend is available for the format. |
ソースコード位置: src/core/office_processor.py
_substitute_font
¶
Determines the font name to use after translation.
When the original and translated texts share the same script family, the original font name is returned unchanged. When scripts differ (e.g. Latin → CJK), a compatible font from the same generic family (serif / sans-serif / monospace) is selected for the target language.
| 引数 | デスクリプション |
|---|---|
original_font
|
The source document's font name.
タイプ:
|
original_text
|
Text before translation.
タイプ:
|
translated_text
|
Text after translation.
タイプ:
|
target_lang
|
Target language name (used for font selection).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str | None
|
The font name to apply, or |
str | None
|
application pick a default). |
ソースコード位置: src/core/office_processor.py
_save_win32com_font
¶
Saves font properties from a win32com Font object.
Reads each property in WIN32COM_FONT_PROPERTIES and stores non-undefined values. Properties that raise (e.g. on merged cells) are silently skipped.
| 引数 | デスクリプション |
|---|---|
font_obj
|
A win32com Range.Font COM object.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
dict
|
Mapping of property name to saved value.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_restore_win32com_font
¶
Restores previously saved font properties to a win32com Font object.
Sets each property independently so a single failure does not prevent other properties from being restored.
When target_lang is provided and "Name" is present in saved,
the font name is substituted via :func:_substitute_font when the
source and target scripts differ.
| 引数 | デスクリプション |
|---|---|
font_obj
|
A win32com Range.Font COM object.
タイプ:
|
saved
|
Mapping of property name to value (from _save_win32com_font).
タイプ:
|
original_text
|
The text before translation (for script detection).
タイプ:
|
translated_text
|
The text after translation (for script detection).
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_read_win32com_char_formatting
¶
Reads inline formatting from a single win32com Word character range.
| 引数 | デスクリプション |
|---|---|
char_range
|
A win32com
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
Tuple of (bold, italic, underline, strike, superscript, subscript, |
bool
|
font_size_pt, color_hex, bg_color_hex). |
bool
|
Properties equal to |
ソースコード位置: src/core/office_processor.py
_has_win32com_range_mixed_formatting
¶
Checks whether a win32com Range has mixed per-character formatting.
Uses a quick-exit via rng.Font.Bold == WIN32COM_UNDEFINED before
falling back to full character-level iteration. Returns False on
any COM exception (conservative: assume uniform formatting).
| 引数 | デスクリプション |
|---|---|
rng
|
A win32com
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if at least two characters have different formatting. |
ソースコード位置: src/core/office_processor.py
_has_win32com_range_hyperlinks
¶
Checks whether a win32com Range contains hyperlinks.
| 引数 | デスクリプション |
|---|---|
rng
|
A win32com
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if the range has at least one hyperlink. |
ソースコード位置: src/core/office_processor.py
_win32com_range_runs_to_html
¶
Converts a win32com Range's characters to inline HTML.
Groups consecutive characters with identical formatting and hyperlink
URL into runs, skipping paragraph marks (\r), then emits HTML via
_wrap_with_tags. Characters inside a hyperlink are tagged with
<a href="..."> so the LLM can preserve links during translation.
| 引数 | デスクリプション |
|---|---|
rng
|
A win32com
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
HTML string representing the range's formatted text. |
ソースコード位置: src/core/office_processor.py
484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 | |
_has_win32com_word_mixed_formatting
¶
Checks whether a win32com Word paragraph has mixed per-char formatting.
Delegates to _has_win32com_range_mixed_formatting on the
paragraph's Range.
| 引数 | デスクリプション |
|---|---|
para
|
A win32com
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if at least two characters have different formatting. |
ソースコード位置: src/core/office_processor.py
_has_win32com_word_hyperlinks
¶
Checks whether a win32com Word paragraph contains hyperlinks.
Delegates to _has_win32com_range_hyperlinks on the paragraph's
Range.
| 引数 | デスクリプション |
|---|---|
para
|
A win32com
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if the paragraph has at least one hyperlink. |
ソースコード位置: src/core/office_processor.py
_win32com_word_runs_to_html
¶
Converts a win32com Word paragraph's characters to inline HTML.
Delegates to _win32com_range_runs_to_html on the paragraph's
Range.
| 引数 | デスクリプション |
|---|---|
para
|
A win32com
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
HTML string representing the paragraph's formatted text. |
ソースコード位置: src/core/office_processor.py
_extract_win32com_word
¶
Extracts text from a Word document via win32com.
For paragraphs with mixed per-run formatting, inline HTML is emitted
via _win32com_word_runs_to_html so the LLM can preserve it.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc or .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_word_html_runs
¶
_inject_win32com_word_html_runs(
doc, rng, html_text, original_text="", *, is_cell=False, target_lang=""
)
Replaces a win32com Word range's text with HTML-formatted segments.
Parses html_text via _parse_html_formatting, sets the full
plain text on the range, then applies per-segment formatting by
creating sub-ranges via doc.Range(start, end).
The original font Name is preserved on the whole range (unless source and target script families differ).
| 引数 | デスクリプション |
|---|---|
doc
|
The win32com Word
タイプ:
|
rng
|
The target
タイプ:
|
html_text
|
Translated text with inline
タイプ:
|
original_text
|
The text before translation (for script detection).
タイプ:
|
is_cell
|
True when injecting into a table cell (no trailing
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 | |
_inject_win32com_word
¶
Injects translations into a Word document via win32com.
For translations containing inline HTML formatting tags, uses
_inject_win32com_word_html_runs to preserve per-run formatting.
Otherwise falls back to uniform font save/restore.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 | |
_extract_win32com_excel
¶
Extracts text from an Excel workbook via win32com.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xls or .xlsx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_excel
¶
Injects translations into an Excel workbook via win32com.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_read_win32com_ppt_run_formatting
¶
Reads inline formatting from a win32com PPT run TextRange.
PPT Font.Color is a ColorFormat object — the BGR integer
is accessed via .RGB. PPT Font.Strikethrough is lowercase 't'.
Superscript/subscript is detected via Font.BaselineOffset:
positive values indicate superscript, negative values indicate subscript.
Background colour is read via Font.Highlight.ForeColor.RGB
(Office 365 / 2019+). Older versions silently return None.
| 引数 | デスクリプション |
|---|---|
run_range
|
A win32com PPT
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
Tuple of (bold, italic, underline, strike, superscript, subscript, |
bool
|
font_size_pt, color_hex, bg_color_hex). |
ソースコード位置: src/core/office_processor.py
996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 | |
_has_win32com_ppt_mixed_formatting
¶
Checks whether a win32com PPT paragraph has mixed per-run formatting.
Iterates para_range.Runs(i) (1-based) and compares formatting tuples.
| 引数 | デスクリプション |
|---|---|
para_range
|
A win32com PPT
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if at least two runs have different formatting. |
ソースコード位置: src/core/office_processor.py
_has_win32com_ppt_hyperlinks
¶
Checks whether a win32com PPT paragraph has hyperlinked runs.
Iterates para_range.Runs(i) and checks each run's
ActionSettings(ppMouseClick).Hyperlink.Address.
| 引数 | デスクリプション |
|---|---|
para_range
|
A win32com PPT
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if at least one run has a non-empty hyperlink address. |
ソースコード位置: src/core/office_processor.py
_win32com_ppt_runs_to_html
¶
Converts a win32com PPT paragraph's runs to inline HTML.
Two-pass: first collects run data, then emits HTML with <span>
only when size/colour actually vary.
| 引数 | デスクリプション |
|---|---|
para_range
|
A win32com PPT
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
HTML string representing the paragraph's formatted text. |
ソースコード位置: src/core/office_processor.py
1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 | |
_extract_win32com_ppt
¶
Extracts text from a PowerPoint presentation via win32com.
For paragraphs with mixed per-run formatting or hyperlinks, inline
HTML is emitted via _win32com_ppt_runs_to_html so the LLM can
preserve them.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ppt or .pptx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_ppt_html_runs
¶
Replaces a win32com PPT paragraph's text with HTML-formatted segments.
Parses html_text via _parse_html_formatting, sets the full
plain text on the paragraph, then applies per-segment formatting
using para_rng.Characters(offset + 1, length) (1-based).
The original font Name is preserved on the whole paragraph (unless source and target script families differ).
| 引数 | デスクリプション |
|---|---|
tf
|
The win32com PPT
タイプ:
|
p_idx
|
1-based paragraph index within the text frame.
タイプ:
|
html_text
|
Translated text with inline
タイプ:
|
original_text
|
The text before translation (for script detection).
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 | |
_inject_win32com_ppt
¶
Injects translations into a PowerPoint presentation via win32com.
For translations containing inline HTML formatting tags, uses
_inject_win32com_ppt_html_runs to preserve per-run formatting.
Otherwise falls back to uniform font save/restore.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 | |
_extract_win32com_word_comments
¶
Extracts comments from a Word document via win32com.
Only top-level comments (where Ancestor is None) are extracted.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_word_comments
¶
Injects translated comments into a Word document via win32com.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .doc file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{index}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_excel_comments
¶
Extracts cell comments from an Excel workbook via win32com.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xls file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{sheet}:{row}:{col}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_excel_comments
¶
Injects translated comments into an Excel workbook via win32com.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xls file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{sheet}:{row}:{col}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_ppt_comments
¶
Extracts comments from a PowerPoint presentation via win32com.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ppt file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{slide_idx}:{comment_idx}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_ppt_comments
¶
Injects translated comments into a PowerPoint presentation via win32com.
Comment.Text in PowerPoint COM may be read-only. Falls back to deleting and re-adding with the same author and metadata.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .ppt file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{slide_idx}:{comment_idx}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_uno_file_url
¶
Converts a file path to a file:/// URL for UNO.
| 引数 | デスクリプション |
|---|---|
path
|
File path to convert.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
The file URL.
タイプ:
|
_uno_open
¶
Opens a document via LibreOffice UNO in hidden mode.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the document.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
object
|
The UNO document object. Caller MUST call |
object
|
in a |
ソースコード位置: src/core/office_processor.py
_uno_save
¶
Saves a UNO document preserving its original format.
Reads the FilterName from the document's own MediaDescriptor
(set during import) and passes it to storeToURL so UNO writes in
the same format as the source file rather than defaulting to ODF.
Falls back to a hardcoded lookup if the descriptor is unavailable.
| 引数 | デスクリプション |
|---|---|
doc
|
The UNO document object.
タイプ:
|
output_path
|
Destination file path.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_save_uno_char_props
¶
Saves character formatting properties from a UNO text object.
Reads each property in UNO_CHAR_PROPERTIES via getPropertyValue(). Properties that raise are silently skipped.
| 引数 | デスクリプション |
|---|---|
text_obj
|
A UNO object supporting XPropertySet (paragraph, cell, etc.).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
dict
|
Mapping of property name to saved value.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_restore_uno_char_props
¶
Restores previously saved character properties to a UNO text object.
Sets each property independently so a single failure does not prevent other properties from being restored.
When target_lang is provided and "CharFontName" is present in
saved, the font name is substituted via :func:_substitute_font
when the source and target scripts differ.
| 引数 | デスクリプション |
|---|---|
text_obj
|
A UNO object supporting XPropertySet.
タイプ:
|
saved
|
Mapping of property name to value (from _save_uno_char_props).
タイプ:
|
original_text
|
The text before translation (for script detection).
タイプ:
|
translated_text
|
The text after translation (for script detection).
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_read_uno_effective_formatting
¶
Reads the effective (resolved) formatting from a UNO text object.
Returns the effective values, which include formatting inherited from paragraph/character styles.
Note: UNO's CharPosture returns a uno.Enum (FontSlant) object,
not a plain integer. Comparing enum != 0 always evaluates to
True, so we detect the enum via its .value string attribute
(e.g. "NONE", "ITALIC").
| 引数 | デスクリプション |
|---|---|
obj
|
A UNO object supporting getPropertyValue (paragraph, portion, or text cursor).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
tuple[bool, bool, bool, bool, bool, bool]
|
(bold, italic, underline, strike, superscript, subscript) booleans. |
ソースコード位置: src/core/office_processor.py
_read_uno_portion_formatting
¶
Reads effective inline formatting flags from a UNO text portion.
Delegates to _read_uno_effective_formatting which handles the
uno.Enum comparison for CharPosture.
| 引数 | デスクリプション |
|---|---|
portion
|
A UNO TextPortion object (XPropertySet).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
tuple[bool, bool, bool, bool, bool, bool]
|
(bold, italic, underline, strike, superscript, subscript) booleans. |
ソースコード位置: src/core/office_processor.py
_read_uno_portion_bg_hex
¶
Reads background/highlight colour from a UNO text portion.
Checks CharHighlight first, then CharBackColor.
Both are integer RGB values; -1 / 0xFFFFFFFF means no colour.
| 引数 | デスクリプション |
|---|---|
portion
|
A UNO TextPortion object (XPropertySet).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str | None
|
Lowercase hex colour string like |
ソースコード位置: src/core/office_processor.py
_read_uno_portion_full_formatting
¶
Reads formatting flags plus font size, colour and bg from a UNO portion.
Extends _read_uno_portion_formatting with CharHeight (float pt),
CharColor (int → hex), and background colour via
_read_uno_portion_bg_hex.
| 引数 | デスクリプション |
|---|---|
portion
|
A UNO TextPortion object (XPropertySet).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
(bold, italic, underline, strike, superscript, subscript, |
bool
|
font_size_pt, color_hex, bg_color_hex). |
ソースコード位置: src/core/office_processor.py
_has_uno_mixed_formatting
¶
Checks whether a UNO paragraph has text portions with differing formatting.
Compares each portion's full formatting (bold, italic, underline, strike, superscript, subscript, font size, colour, background colour). Only considers portions with TextPortionType == "Text" and non-empty text. Returns False if 0 or 1 text portions remain.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO paragraph supporting createEnumeration().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if at least two text portions have different formatting. |
ソースコード位置: src/core/office_processor.py
_has_uno_hyperlinks
¶
Checks whether a UNO paragraph has any portions with hyperlinks.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO paragraph supporting createEnumeration().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if at least one text portion has a non-empty HyperLinkURL. |
ソースコード位置: src/core/office_processor.py
_uno_runs_to_html
¶
Converts a UNO paragraph's text portions to inline HTML.
Two-pass approach: first collects all portion data to detect
size/colour/bg variation, then emits HTML with <span> only when
needed. Portions with hyperlinks are wrapped in <a href="..."> tags.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO paragraph supporting createEnumeration().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
HTML string representing the paragraph's formatted text. |
ソースコード位置: src/core/office_processor.py
2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 2081 2082 2083 2084 2085 2086 2087 2088 2089 2090 2091 2092 2093 2094 2095 2096 2097 2098 2099 2100 2101 2102 2103 2104 2105 2106 2107 2108 2109 2110 2111 2112 2113 2114 2115 2116 2117 2118 2119 2120 2121 2122 2123 2124 2125 2126 2127 2128 2129 2130 2131 2132 2133 2134 2135 2136 2137 2138 2139 2140 2141 2142 2143 2144 2145 2146 2147 2148 2149 2150 2151 2152 2153 2154 2155 2156 | |
_save_uno_first_portion_props
¶
Reads UNO_CHAR_PROPERTIES from the first text portion of a paragraph.
This captures the actual font properties (name, size, colour) from the first run rather than from the paragraph level, which may differ.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO paragraph supporting createEnumeration().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
dict[str, object]
|
dict mapping property names to values. Empty if no text portion found. |
ソースコード位置: src/core/office_processor.py
_inject_uno_html_runs
¶
Replaces a UNO paragraph's text with HTML-formatted segments.
Parses html_text via _parse_html_formatting, sets the full
plain text on the paragraph, then applies per-segment formatting via
a text cursor.
Base properties (font name, size, colour) from base_props are restored
on the whole paragraph first, excluding the four formatting properties
that are applied per-segment. CharFontName is substituted with a
compatible font when original and translated script families differ.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO paragraph object.
タイプ:
|
html_text
|
Translated text with inline ///
タイプ:
|
base_props
|
Saved properties from
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
2190 2191 2192 2193 2194 2195 2196 2197 2198 2199 2200 2201 2202 2203 2204 2205 2206 2207 2208 2209 2210 2211 2212 2213 2214 2215 2216 2217 2218 2219 2220 2221 2222 2223 2224 2225 2226 2227 2228 2229 2230 2231 2232 2233 2234 2235 2236 2237 2238 2239 2240 2241 2242 2243 2244 2245 2246 2247 2248 2249 2250 2251 2252 2253 2254 2255 2256 2257 2258 2259 2260 2261 2262 2263 2264 2265 2266 2267 2268 2269 2270 2271 2272 2273 2274 2275 2276 2277 2278 2279 2280 2281 2282 2283 2284 2285 2286 2287 2288 2289 2290 2291 2292 2293 2294 2295 2296 2297 2298 2299 2300 2301 2302 2303 2304 2305 2306 2307 2308 2309 2310 2311 2312 2313 | |
_inject_uno_impress_html_runs
¶
Impress-specific variant of _inject_uno_html_runs.
Impress text cursors do not implement XParagraphCursor
(no gotoStartOfParagraph/gotoEndOfParagraph). This
function uses pure offset-based positioning via goRight
from the paragraph start range instead.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO Impress paragraph object.
タイプ:
|
html_text
|
Translated text with inline HTML tags.
タイプ:
|
base_props
|
Saved properties from
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
2316 2317 2318 2319 2320 2321 2322 2323 2324 2325 2326 2327 2328 2329 2330 2331 2332 2333 2334 2335 2336 2337 2338 2339 2340 2341 2342 2343 2344 2345 2346 2347 2348 2349 2350 2351 2352 2353 2354 2355 2356 2357 2358 2359 2360 2361 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 2372 2373 2374 2375 2376 2377 2378 2379 2380 2381 2382 2383 2384 2385 2386 2387 2388 2389 2390 2391 2392 2393 2394 2395 2396 2397 2398 2399 2400 2401 2402 2403 2404 2405 2406 2407 2408 2409 2410 2411 2412 2413 2414 2415 2416 2417 2418 2419 2420 2421 2422 2423 2424 2425 2426 2427 | |
_inject_uno_impress_para_text
¶
Injects translated text into a single UNO Impress paragraph.
Uses _inject_uno_impress_html_runs for HTML-tagged text,
plain setString with property save/restore otherwise.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO Impress paragraph object.
タイプ:
|
text
|
Translated text (plain or HTML-tagged).
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_writer
¶
Extracts text from a Writer document via UNO.
When a paragraph has mixed per-run formatting (e.g. bold + italic portions), the text is encoded as inline HTML so the LLM can preserve formatting tags.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc or .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs — plain text or inline HTML.
タイプ:
|
ソースコード位置: src/core/office_processor.py
2472 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 2521 2522 2523 2524 2525 2526 2527 2528 2529 2530 2531 2532 2533 2534 2535 2536 2537 2538 | |
_inject_uno_para_text
¶
Injects translated text into a single UNO paragraph.
Dispatches to _inject_uno_html_runs when text contains inline
HTML formatting tags, otherwise uses plain setString with
paragraph-level property save/restore.
| 引数 | デスクリプション |
|---|---|
para
|
A UNO paragraph object.
タイプ:
|
text
|
Translated text (plain or HTML-tagged).
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_cell_text
¶
Injects translated text into a UNO table cell.
For single-paragraph cells with HTML tags, dispatches to
_inject_uno_html_runs. Otherwise uses plain setString
with cell-level property save/restore.
| 引数 | デスクリプション |
|---|---|
cell
|
A UNO table cell object.
タイプ:
|
text
|
Translated text (plain or HTML-tagged).
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_writer
¶
Injects translations into a Writer document via UNO.
When the translated text contains inline HTML formatting tags
(<b>, <i>, <u>, <s>), per-segment formatting is
applied via _inject_uno_html_runs. Otherwise, plain text is
set with paragraph-level property restore.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_calc
¶
Extracts text from a Calc spreadsheet via UNO.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xls or .xlsx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_calc
¶
Injects translations into a Calc spreadsheet via UNO.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_impress
¶
Extracts text from an Impress presentation via UNO.
When any paragraph within a shape has mixed per-run formatting, the
entire shape is extracted as inline HTML via _uno_runs_to_html
(paragraphs joined by newlines). Otherwise, plain text is returned.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ppt or .pptx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_impress
¶
Injects translations into an Impress presentation via UNO.
When the translated text contains inline HTML formatting tags,
dispatches to _inject_uno_impress_para_text for per-run
formatting on each paragraph (lines separated by newlines).
Uses offset-based cursor positioning instead of XParagraphCursor
methods. Otherwise, uses plain setString with shape-level
property save/restore.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
2811 2812 2813 2814 2815 2816 2817 2818 2819 2820 2821 2822 2823 2824 2825 2826 2827 2828 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 2860 2861 2862 2863 2864 2865 2866 2867 2868 2869 2870 2871 2872 2873 2874 2875 2876 2877 2878 | |
_extract_uno_writer_comments
¶
Extracts annotation comments from a Writer document via UNO.
Enumerates text fields and filters by Annotation service.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_writer_comments
¶
Injects translated comments into a Writer document via UNO.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .doc file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{index}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_calc_comments
¶
Extracts cell annotations from a Calc spreadsheet via UNO.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xls file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{sheet}:{row}:{col}' (1-based for XLSX compatibility).
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_calc_comments
¶
Injects translated comments into a Calc spreadsheet via UNO.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xls file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{sheet}:{row}:{col}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_impress_comments
¶
Extracts annotations from an Impress presentation via UNO.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ppt file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{page_idx}:{anno_idx}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_impress_comments
¶
Injects translated comments into an Impress presentation via UNO.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .ppt file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{page_idx}:{anno_idx}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_convert_with_win32com
¶
Converts an office file to another format using win32com SaveAs.
Uses the output extension to determine the application and format code.
| 引数 | デスクリプション |
|---|---|
input_path
|
Path to the source file.
タイプ:
|
output_path
|
Path for the converted file.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_convert_with_uno
¶
Converts an office file to another format using LibreOffice UNO.
Uses the output extension to select the export filter name.
| 引数 | デスクリプション |
|---|---|
input_path
|
Path to the source file.
タイプ:
|
output_path
|
Path for the converted file.
タイプ:
|
ソースコード位置: src/core/office_processor.py
convert_to_modern_format
¶
Converts a legacy/ODF office file to modern format (.docx/.xlsx/.pptx).
Detects the available backend (win32com or UNO) and delegates to the appropriate conversion helper. Returns True on success, False on failure (logs a warning instead of raising).
| 引数 | デスクリプション |
|---|---|
input_path
|
Path to the translated file in legacy/ODF format.
タイプ:
|
output_path
|
Path for the converted modern format file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if conversion succeeded, False otherwise.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_odf_qnames
¶
Returns cached (tab_qname, linebreak_qname, span_qname, a_qname).
ソースコード位置: src/core/office_processor.py
_odf_element_text
¶
Recursively extracts all text content from an ODF element.
Walks the element's childNodes tree. Text nodes (nodeType == 3) have their data collected. Element nodes (nodeType == 1) are recursed into. Tab elements produce a tab character; line-break elements produce a newline.
When preserve_links is True, <text:a> hyperlinks are emitted as
<a href="url">text</a> HTML tags instead of plain text. This is
used during extraction so the LLM sees (and preserves) hyperlink
structure.
| 引数 | デスクリプション |
|---|---|
element
|
An odfpy element node.
タイプ:
|
preserve_links
|
If True, emit
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
The concatenated text content (may contain
タイプ:
|
ソースコード位置: src/core/office_processor.py
_odf_replace_text
¶
Replaces all text content in an ODF element with new text.
Preserves the first <text:span>'s stylename attribute so that
character formatting (bold, italic, font, etc.) is retained. If no
span is found, falls back to plain addText().
When new_text contains <a href="..."> HTML tags (from hyperlink
preservation during extraction), parses them via
_parse_html_formatting and creates <text:a> elements with the
correct xlink:href attribute.
Note
odfpy's removeChild() cannot handle text nodes (nodeType == 3) because its internal cache assertion requires Element instances. We manually clear childNodes and only update caches for Elements.
| 引数 | デスクリプション |
|---|---|
element
|
An odfpy element node (typically a P or H element).
タイプ:
|
new_text
|
The replacement text (may contain
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
3312 3313 3314 3315 3316 3317 3318 3319 3320 3321 3322 3323 3324 3325 3326 3327 3328 3329 3330 3331 3332 3333 3334 3335 3336 3337 3338 3339 3340 3341 3342 3343 3344 3345 3346 3347 3348 3349 3350 3351 3352 3353 3354 3355 3356 3357 3358 3359 3360 3361 3362 3363 3364 3365 3366 3367 3368 3369 3370 3371 3372 3373 3374 3375 3376 3377 3378 3379 3380 3381 3382 3383 3384 3385 3386 3387 3388 3389 3390 | |
_is_inside_table_cell
¶
Checks if an ODF element is nested inside a table cell.
| 引数 | デスクリプション |
|---|---|
element
|
An odfpy element node.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if a TableCell ancestor is found.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_resolve_para_hyperlink_rels
¶
Resolves hyperlink r:id values to URLs for a paragraph.
Scans para._element for <w:hyperlink> children, looks up
each r:id in the document's relationship collection, and
returns a mapping of r:id → target URL.
| 引数 | デスクリプション |
|---|---|
para
|
A python-docx Paragraph object.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
dict[str, str]
|
dict mapping |
dict[str, str]
|
external hyperlinks exist. |
ソースコード位置: src/core/office_processor.py
_extract_para_with_links
¶
Extracts text from a paragraph, preserving hyperlinks as <a> tags.
Uses the HTML path (_runs_to_html) when the paragraph has mixed
formatting or <w:hyperlink> children. Falls back to para.text
for simple uniform-formatting paragraphs without hyperlinks.
| 引数 | デスクリプション |
|---|---|
para
|
A python-docx Paragraph object.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
Plain text or inline HTML string. |
ソースコード位置: src/core/office_processor.py
_extract_python_docx
¶
Extracts text from a DOCX file via python-docx.
Extracts paragraph text and table cell text. Each paragraph or cell with non-empty text gets a unique location key. When a paragraph has mixed formatting or hyperlinks, the text is encoded as inline HTML so the LLM can preserve formatting and link tags.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs — plain text or inline HTML.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_set_odf_default_rtl
¶
Rewrites file_path (an ODF zip) so paragraphs default to RTL.
Adds — or extends — the <style:default-style style:family="paragraph">
block in styles.xml to set style:writing-mode="rl-tb" and
fo:text-align="end". Idempotent: running on an already-RTL
document is a no-op.
ソースコード位置: src/core/office_processor.py
3523 3524 3525 3526 3527 3528 3529 3530 3531 3532 3533 3534 3535 3536 3537 3538 3539 3540 3541 3542 3543 3544 3545 3546 3547 3548 3549 3550 3551 3552 3553 3554 3555 3556 3557 3558 3559 3560 3561 3562 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 3584 3585 3586 3587 3588 | |
_set_docx_paragraph_rtl
¶
Adds <w:bidi/> to the paragraph and <w:rtl/> to every run.
Word and LibreOffice Writer use these flags to flip paragraph direction and shape mirrored punctuation (parens, quotes) at run boundaries. Without them an Arabic / Hebrew paragraph renders flush-left with broken punctuation.
ソースコード位置: src/core/office_processor.py
_inject_python_docx
¶
Injects translations into a DOCX file via python-docx.
When the translated text contains inline HTML formatting tags
(<b>, <i>, <u>, <s>, <a>), _inject_html_runs
creates per-run formatting and hyperlink wrappers. Otherwise falls
back to _replace_paragraph_text.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name; when RTL, every paragraph in
the document is marked with
タイプ:
|
ソースコード位置: src/core/office_processor.py
3611 3612 3613 3614 3615 3616 3617 3618 3619 3620 3621 3622 3623 3624 3625 3626 3627 3628 3629 3630 3631 3632 3633 3634 3635 3636 3637 3638 3639 3640 3641 3642 3643 3644 3645 3646 3647 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 3658 3659 3660 3661 3662 3663 3664 3665 3666 3667 3668 3669 3670 3671 3672 3673 3674 | |
_extract_python_xlsx
¶
Extracts text from an XLSX file via openpyxl.
Iterates all sheets and collects cells with string values.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xlsx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_python_xlsx
¶
Injects translations into an XLSX file via openpyxl.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_walk_pptx_text_shapes
¶
Yield (shape_path, leaf_shape) for every text-bearing shape.
Recurses into shape groups via duck-typing on .shapes: a
GroupShape exposes child shapes there, a regular text box
doesn't. The returned shape_path is a dotted index chain
("0", "0.1", "2.0.3", …) so leaf positions stay stable
across runs and survive the extract → inject round trip.
ソースコード位置: src/core/office_processor.py
_extract_python_pptx
¶
Extracts text from a PPTX file via python-pptx.
Iterates slides and recurses through shape groups, then walks
paragraphs and runs of every text frame. Each non-empty paragraph
gets a location key encoding the slide + dotted shape path + para
index so grouped text round-trips through inject. Paragraphs with
mixed formatting or hyperlinks are encoded as inline HTML so the
LLM can preserve formatting and <a> tags.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .pptx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs — plain text or inline HTML.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_set_pptx_paragraph_rtl
¶
Adds rtl="1" to a python-pptx paragraph's <a:pPr>.
PowerPoint and Keynote use this attribute to flip text-frame paragraph direction. Idempotent.
ソースコード位置: src/core/office_processor.py
_inject_python_pptx
¶
Injects translations into a PPTX file via python-pptx.
For each translated paragraph: puts all text in the first run
and clears other runs (preserves first run's formatting).
When the translated text contains <a> tags, hyperlink
relationships are created via the slide part.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language; when RTL, every paragraph in every
text frame is marked with
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_python_odt
¶
Extracts text from an ODT file via odfpy.
Extracts body paragraphs, headings, and table cell text. Paragraphs inside table cells are excluded from body paragraph counting (they are handled via the table iteration).
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .odt file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_python_odt
¶
Injects translations into an ODT file via odfpy.
For paragraphs and headings: replaces all child text with the translated text (inline formatting is not preserved, matching UNO backend behavior). For table cells: replaces text in the first paragraph element.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
3949 3950 3951 3952 3953 3954 3955 3956 3957 3958 3959 3960 3961 3962 3963 3964 3965 3966 3967 3968 3969 3970 3971 3972 3973 3974 3975 3976 3977 3978 3979 3980 3981 3982 3983 3984 3985 3986 3987 3988 3989 3990 3991 3992 3993 3994 3995 3996 3997 3998 3999 4000 4001 4002 4003 4004 4005 4006 4007 4008 4009 4010 4011 4012 4013 4014 4015 | |
_extract_python_ods
¶
Extracts text from an ODS file via odfpy.
Iterates all sheets and collects cells with string text content.
Uses the same key format as _extract_python_xlsx:
sheet:{name}:{row}:{col} with 1-based indices.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ods file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_python_ods
¶
Injects translations into an ODS file via odfpy.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_python_odp
¶
Extracts text from an ODP file via odfpy.
Iterates presentation pages, draw frames, and paragraphs within.
Each non-empty paragraph gets a location key using the same format
as _extract_python_pptx: slide:{s}:{sh}:{p}.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .odp file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_python_odp
¶
Injects translations into an ODP file via odfpy.
For each translated paragraph: replaces all text content, matching UNO backend behavior.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_python_word
¶
Routes word-category extraction based on file extension.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the document (.docx or .odt).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_python_word
¶
Routes word-category injection based on file extension.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_python_excel
¶
Routes excel-category extraction based on file extension.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the spreadsheet (.xlsx or .ods).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_python_excel
¶
Routes excel-category injection based on file extension.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_python_ppt
¶
Routes ppt-category extraction based on file extension.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the presentation (.pptx or .odp).
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_python_ppt
¶
Routes ppt-category injection based on file extension.
| 引数 | デスクリプション |
|---|---|
file_path
|
Source file path.
タイプ:
|
output_path
|
Output file path.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_get_file_category
¶
Returns the file category for dispatch.
| 引数 | デスクリプション |
|---|---|
suffix
|
Lowercase file extension.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
"word", "excel", or "ppt".
タイプ:
|
| 発生 | デスクリプション |
|---|---|
ValueError
|
If the extension is not an office format. |
ソースコード位置: src/core/office_processor.py
_is_fatal_llm_error
¶
Returns True when error_tag is in _FATAL_LLM_ERRORS.
Delegates to :func:src.constants.errors.base_error_tag to strip
the optional :Service suffix the engine appends to AUTH_ERROR
so "AUTH_ERROR:Gemini" matches as fatal alongside the bare
"AUTH_ERROR".
ソースコード位置: src/core/office_processor.py
_should_translate_images
¶
Checks whether image translation should be attempted for this file.
Returns True when the setting is enabled, OCR is configured, and the
format supports embedded image translation. Modern/ODF formats use
zipfile directly; legacy formats (.doc, .xls, .ppt) use round-trip
conversion to a modern format first.
| 引数 | デスクリプション |
|---|---|
suffix
|
Lowercase file extension (e.g. ".docx").
タイプ:
|
backend
|
The detected backend identifier (unused, kept for API
consistency with
タイプ:
|
config
|
Optional TranslationConfig snapshot; falls back to load_setting().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if image translation should proceed.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_should_translate_comments
¶
Checks whether comment translation should be attempted for this file.
Returns True when the setting is enabled and the format supports comment extraction. Comment handling uses its own libraries (python-docx, openpyxl, python-pptx, zipfile+lxml) independently of the text-extraction backend, so no backend restriction is needed.
| 引数 | デスクリプション |
|---|---|
suffix
|
Lowercase file extension (e.g. ".docx").
タイプ:
|
backend
|
The detected backend identifier (unused, kept for API
consistency with
タイプ:
|
config
|
Optional TranslationConfig snapshot; falls back to load_setting().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if comment translation should proceed.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_should_translate_shapes
¶
Checks whether shape/text-box translation should be attempted.
Returns True when the setting is enabled and the format supports shape extraction. PPT formats are excluded because their primary extractors already handle shapes.
| 引数 | デスクリプション |
|---|---|
suffix
|
Lowercase file extension (e.g. ".docx").
タイプ:
|
backend
|
The detected backend identifier (unused, kept for API
consistency with
タイプ:
|
config
|
Optional TranslationConfig snapshot; falls back to load_setting().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if shape translation should proceed.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_comments
¶
Extracts comments from an office file.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the office file.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs for comments.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_docx_comments
¶
Extracts comments from a DOCX file via low-level XML access.
Detects <w:hyperlink> elements within comment paragraphs and emits
<a href="..."> HTML tags so that hyperlinks are preserved through
the LLM translation round-trip. Hyperlink URLs are resolved from the
comments part's .rels file.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{id}'.
Text may contain
タイプ:
|
ソースコード位置: src/core/office_processor.py
4636 4637 4638 4639 4640 4641 4642 4643 4644 4645 4646 4647 4648 4649 4650 4651 4652 4653 4654 4655 4656 4657 4658 4659 4660 4661 4662 4663 4664 4665 4666 4667 4668 4669 4670 4671 4672 4673 4674 4675 4676 4677 4678 4679 4680 4681 4682 4683 4684 4685 4686 4687 4688 4689 4690 4691 4692 4693 4694 4695 4696 4697 4698 4699 4700 4701 4702 4703 4704 4705 4706 4707 4708 4709 4710 4711 4712 4713 4714 4715 4716 4717 4718 4719 4720 4721 4722 4723 4724 4725 4726 4727 4728 4729 4730 4731 4732 4733 | |
_extract_xlsx_comments
¶
Extracts cell comments from an XLSX file via openpyxl.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xlsx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{sheet}:{row}:{col}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_comments
¶
Injects translated comments back into the output document.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the output file (already written by inject_fn).
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_docx_comments
¶
Injects translated comments into a DOCX file via low-level XML.
When a translation contains <a href="..."> tags, the comment's
paragraphs are rebuilt with <w:hyperlink> elements and the
corresponding relationships are added to
word/_rels/comments.xml.rels. Plain-text translations use the
simpler <w:t> replacement path.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .docx file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{id}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
4811 4812 4813 4814 4815 4816 4817 4818 4819 4820 4821 4822 4823 4824 4825 4826 4827 4828 4829 4830 4831 4832 4833 4834 4835 4836 4837 4838 4839 4840 4841 4842 4843 4844 4845 4846 4847 4848 4849 4850 4851 4852 4853 4854 4855 4856 4857 4858 4859 4860 4861 4862 4863 4864 4865 4866 4867 4868 4869 4870 4871 4872 4873 4874 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 | |
_inject_docx_comment_html
¶
Rebuilds a single comment element's paragraphs from HTML.
Parses html_text via _parse_html_formatting to obtain
_FormattedSegment objects. Segments with hyperlink_url are
wrapped in <w:hyperlink> elements with relationship IDs created
via comments_part.relate_to().
| 引数 | デスクリプション |
|---|---|
comment_el
|
The
タイプ:
|
html_text
|
Translated HTML string (may contain
タイプ:
|
comments_part
|
The python-docx comments
タイプ:
|
qn
|
The python-docx
タイプ:
|
ソースコード位置: src/core/office_processor.py
4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914 4915 4916 4917 4918 4919 4920 4921 4922 4923 4924 4925 4926 4927 4928 4929 4930 4931 4932 4933 4934 4935 4936 4937 4938 4939 4940 4941 4942 4943 4944 4945 4946 4947 4948 4949 4950 4951 4952 4953 4954 4955 4956 4957 4958 4959 4960 4961 4962 4963 4964 4965 4966 4967 4968 4969 4970 4971 4972 4973 4974 4975 4976 4977 4978 4979 4980 4981 4982 4983 4984 4985 4986 4987 4988 4989 4990 4991 4992 4993 4994 4995 4996 4997 4998 4999 5000 5001 5002 5003 5004 5005 5006 5007 5008 5009 5010 5011 5012 5013 5014 5015 5016 5017 5018 5019 5020 5021 5022 5023 5024 5025 5026 5027 5028 5029 5030 5031 5032 5033 5034 5035 5036 5037 5038 5039 5040 5041 5042 5043 5044 5045 5046 5047 5048 5049 5050 5051 5052 5053 5054 5055 5056 | |
_patch_docx_comment_rels
¶
Ensures word/_rels/comments.xml.rels is persisted in the DOCX ZIP.
python-docx may not serialize .rels for the comments part
when saved via doc.save(). This function verifies and patches
the ZIP directly if the rels data is missing or stale.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the saved .docx file.
タイプ:
|
comments_part
|
The python-docx comments Part (with .rels data).
タイプ:
|
ソースコード位置: src/core/office_processor.py
5059 5060 5061 5062 5063 5064 5065 5066 5067 5068 5069 5070 5071 5072 5073 5074 5075 5076 5077 5078 5079 5080 5081 5082 5083 5084 5085 5086 5087 5088 5089 5090 5091 5092 5093 5094 5095 5096 5097 5098 5099 5100 5101 5102 5103 5104 5105 5106 5107 5108 5109 5110 5111 5112 5113 5114 5115 5116 5117 5118 5119 5120 5121 5122 5123 5124 5125 5126 | |
_inject_xlsx_comments
¶
Injects translated comments into an XLSX file via openpyxl.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xlsx file to modify in place.
タイプ:
|
translations
|
Mapping of 'comment:{sheet}:{row}:{col}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_get_rels_path
¶
Returns the .rels path for a given XML part path inside a ZIP.
E.g. 'word/document.xml' → 'word/_rels/document.xml.rels'.
| 引数 | デスクリプション |
|---|---|
part_path
|
Path of the XML part inside the ZIP.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
Path of the corresponding |
ソースコード位置: src/core/office_processor.py
_parse_hyperlink_rels
¶
Parses a .rels XML file into {r_id: url} for hyperlinks.
Only external hyperlink relationships (TargetMode="External") are
included.
| 引数 | デスクリプション |
|---|---|
rels_xml
|
Raw bytes of the
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
dict[str, str]
|
dict mapping relationship IDs to target URLs. |
ソースコード位置: src/core/office_processor.py
_add_hyperlink_to_rels
¶
Adds a hyperlink relationship to a .rels XML file.
If rels_xml is None, creates a new Relationships document.
| 引数 | デスクリプション |
|---|---|
rels_xml
|
Existing
タイプ:
|
url
|
The target URL for the hyperlink.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
tuple[bytes, str]
|
Tuple of |
ソースコード位置: src/core/office_processor.py
_extract_drawingml_text
¶
Extracts plain text from a DrawingML <txBody> element.
Iterates <a:p> paragraphs and joins <a:t> runs within each.
Paragraphs are separated by newlines, and <a:br/> tags are preserved
as newlines.
| 引数 | デスクリプション |
|---|---|
tx_body_el
|
An lxml element representing
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
The concatenated plain text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_drawingml_text
¶
Replaces text in a DrawingML <txBody> element.
Puts all translated text in the first <a:t> of the first
<a:r> in the first <a:p>, and clears remaining <a:t>
elements. Handles newlines by inserting <a:br/> and new <a:r>
elements.
| 引数 | デスクリプション |
|---|---|
tx_body_el
|
An lxml element representing
タイプ:
|
new_text
|
The replacement text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_drawingml_html_runs
¶
Replaces DrawingML <a:txBody> runs with HTML-formatted segments.
Parses html_text via _parse_html_formatting, clears existing
<a:r> elements, and rebuilds runs with per-segment <a:rPr>
formatting. Falls back to _inject_drawingml_text if no HTML
formatting tags are detected.
When rels_adder is provided, segments with hyperlink_url get an
<a:hlinkClick> element created inside <a:rPr> with a
relationship ID returned by the callback.
| 引数 | デスクリプション |
|---|---|
tx_body_el
|
An lxml element representing
タイプ:
|
html_text
|
Translated text with inline
タイプ:
|
rels_adder
|
Callback that accepts a URL string and returns a
relationship ID (
タイプ:
|
ソースコード位置: src/core/office_processor.py
5338 5339 5340 5341 5342 5343 5344 5345 5346 5347 5348 5349 5350 5351 5352 5353 5354 5355 5356 5357 5358 5359 5360 5361 5362 5363 5364 5365 5366 5367 5368 5369 5370 5371 5372 5373 5374 5375 5376 5377 5378 5379 5380 5381 5382 5383 5384 5385 5386 5387 5388 5389 5390 5391 5392 5393 5394 5395 5396 5397 5398 5399 5400 5401 5402 5403 5404 5405 5406 5407 5408 5409 5410 5411 5412 5413 5414 5415 5416 5417 5418 5419 5420 5421 5422 5423 5424 5425 5426 5427 5428 5429 5430 5431 5432 5433 5434 5435 5436 5437 5438 5439 5440 5441 5442 5443 5444 5445 5446 5447 5448 5449 5450 5451 5452 5453 5454 5455 5456 5457 5458 5459 5460 5461 5462 | |
_extract_pptx_legacy_comments
¶
Extracts legacy comments from an already-opened Presentation.
Legacy comments use <p:cm> elements with <p:text> children
(PowerPoint 2007–2019).
| 引数 | デスクリプション |
|---|---|
prs
|
A python-pptx Presentation object.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_pptx_modern_comments
¶
Extracts modern threaded comments from an already-opened Presentation.
Modern comments use <p188:cm> elements with <txBody> rich
text and an optional <replyLst> (PowerPoint 365, 2021+).
| 引数 | デスクリプション |
|---|---|
prs
|
A python-pptx Presentation object.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs. Main comments use keys like
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_pptx_comments
¶
Extracts comments from a PPTX file via low-level XML on slide parts.
Handles both legacy comments (<p:cm>) and modern threaded
comments (<p188:cm>). A single file typically uses one format,
but both are checked.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .pptx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_pptx_legacy_comments
¶
Injects translated text into legacy PPTX comments.
| 引数 | デスクリプション |
|---|---|
prs
|
A python-pptx Presentation object.
タイプ:
|
translations
|
Mapping of location keys to translated text.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if any comment was modified.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_pptx_modern_comments
¶
Injects translated text into modern threaded PPTX comments.
| 引数 | デスクリプション |
|---|---|
prs
|
A python-pptx Presentation object.
タイプ:
|
translations
|
Mapping of location keys to translated text.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if any comment was modified.
タイプ:
|
ソースコード位置: src/core/office_processor.py
5632 5633 5634 5635 5636 5637 5638 5639 5640 5641 5642 5643 5644 5645 5646 5647 5648 5649 5650 5651 5652 5653 5654 5655 5656 5657 5658 5659 5660 5661 5662 5663 5664 5665 5666 5667 5668 5669 5670 5671 5672 5673 5674 5675 5676 5677 5678 5679 5680 5681 5682 5683 5684 5685 5686 5687 5688 5689 5690 5691 5692 5693 5694 5695 5696 5697 | |
_inject_pptx_comments
¶
Injects translated comments into a PPTX file via low-level XML.
Handles both legacy and modern threaded comment formats.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .pptx file to modify in place.
タイプ:
|
translations
|
Mapping of location keys to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_rewrite_zip_content
¶
Atomically rewrites a ZIP archive with modified file data.
Writes to a temporary file then replaces the original. Used by all zip-based inject functions (DOCX/XLSX/ODF shapes and comments).
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the ZIP file to overwrite.
タイプ:
|
file_data
|
Mapping of archive entry names to their (possibly modified) content bytes.
タイプ:
|
all_items
|
Original
タイプ:
|
ソースコード位置: src/core/office_processor.py
_patch_rels_for_embeddings
¶
Restores embedding relationship entries into output rels files.
| 引数 | デスクリプション |
|---|---|
file_data
|
Mutable mapping of output ZIP entries (modified in place).
タイプ:
|
src_rels
|
Source
タイプ:
|
new_items
|
Accumulator for new ZIP entries (appended if a rels file is entirely missing from file_data).
タイプ:
|
ソースコード位置: src/core/office_processor.py
_restore_xlsx_embeddings
¶
Restores embedded objects that openpyxl drops during save.
openpyxl does not preserve OLE/package embedded objects stored
under xl/embeddings/ or their relationship and content-type
entries. This function reads those artefacts from source_path
and patches them back into output_path after openpyxl's save.
| 引数 | デスクリプション |
|---|---|
source_path
|
Original XLSX before openpyxl processing.
タイプ:
|
output_path
|
XLSX written by openpyxl (modified in place).
タイプ:
|
ソースコード位置: src/core/office_processor.py
5807 5808 5809 5810 5811 5812 5813 5814 5815 5816 5817 5818 5819 5820 5821 5822 5823 5824 5825 5826 5827 5828 5829 5830 5831 5832 5833 5834 5835 5836 5837 5838 5839 5840 5841 5842 5843 5844 5845 5846 5847 5848 5849 5850 5851 5852 5853 5854 5855 5856 5857 5858 5859 5860 5861 5862 5863 5864 5865 5866 5867 5868 5869 5870 5871 5872 5873 | |
_extract_odf_paragraph_text
¶
Extracts concatenated paragraph text from an ODF element.
Works with any element that contains <text:p> children, such as
<draw:text-box> and <office:annotation>. Handles mixed
content: direct text, child element text, and tail text.
ODF hyperlinks (<text:a xlink:href="URL">) are emitted as
<a href="URL">text</a> so that downstream HTML-aware injection
can reconstruct them.
| 引数 | デスクリプション |
|---|---|
parent
|
An lxml element containing
タイプ:
|
text_p_tag
|
Fully-qualified
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
Paragraphs joined by newlines, stripped. May contain
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_odf_paragraph_text
¶
Replaces text in an ODF element that contains <text:p> children.
Puts the translated text in the first <text:p>, clears its
children, and removes any extra <text:p> elements. Handles newlines
by creating additional <text:p> elements. Works for
both <draw:text-box> and <office:annotation>.
Preserves the first <text:span>'s attributes so that character
formatting (font name, size, bold, etc.) is retained.
| 引数 | デスクリプション |
|---|---|
parent
|
An lxml element containing
タイプ:
|
new_text
|
The replacement text.
タイプ:
|
text_p_tag
|
Fully-qualified
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if the element was modified.
タイプ:
|
ソースコード位置: src/core/office_processor.py
5942 5943 5944 5945 5946 5947 5948 5949 5950 5951 5952 5953 5954 5955 5956 5957 5958 5959 5960 5961 5962 5963 5964 5965 5966 5967 5968 5969 5970 5971 5972 5973 5974 5975 5976 5977 5978 5979 5980 5981 5982 5983 5984 5985 5986 5987 5988 5989 5990 5991 5992 5993 5994 5995 5996 5997 5998 5999 6000 6001 6002 6003 6004 6005 6006 6007 6008 6009 6010 6011 6012 6013 6014 6015 6016 6017 6018 6019 | |
_inject_odf_paragraph_text_html
¶
Injects HTML-formatted text (with hyperlinks) into ODF paragraphs.
Parses new_text via _parse_html_formatting and reconstructs
<text:p> children. Segments with hyperlink_url become
<text:a xlink:href="..."> elements; plain segments become direct
text or <text:span> elements (preserving character style).
Falls back to plain-text injection when parsing yields no segments.
| 引数 | デスクリプション |
|---|---|
parent
|
An lxml element containing
タイプ:
|
new_text
|
Translated text with inline
タイプ:
|
text_p_tag
|
Fully-qualified
タイプ:
|
paras
|
Pre-found
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if the element was modified.
タイプ:
|
ソースコード位置: src/core/office_processor.py
6022 6023 6024 6025 6026 6027 6028 6029 6030 6031 6032 6033 6034 6035 6036 6037 6038 6039 6040 6041 6042 6043 6044 6045 6046 6047 6048 6049 6050 6051 6052 6053 6054 6055 6056 6057 6058 6059 6060 6061 6062 6063 6064 6065 6066 6067 6068 6069 6070 6071 6072 6073 6074 6075 6076 6077 6078 6079 6080 6081 6082 6083 6084 6085 6086 6087 6088 6089 6090 6091 6092 6093 6094 6095 6096 6097 6098 6099 6100 6101 6102 6103 6104 6105 6106 6107 6108 6109 6110 6111 6112 6113 6114 6115 6116 6117 6118 6119 6120 6121 6122 6123 6124 6125 6126 6127 6128 6129 6130 | |
_extract_odf_comments
¶
Extracts annotation text from an ODF file (.odt, .ods, .odp).
Opens the ZIP archive, parses content.xml, and collects text
from all <office:annotation> elements.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the ODF file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'comment:{annotation_name}' or 'comment:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_odf_comments
¶
Injects translated comments into an ODF file (.odt, .ods, .odp).
Reads the ZIP archive, modifies <office:annotation> text in
content.xml, and writes the archive back.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the ODF file to modify in place.
タイプ:
|
translations
|
Mapping of
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_shapes
¶
Extracts text from shapes and text boxes in an office file.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the office file.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs for shape text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_shapes
¶
Injects translated shape text back into the output document.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the output file (already written by inject_fn).
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_sanitize_sheet_name
¶
Sanitises a translated sheet name for Excel/Calc compatibility.
Removes invalid characters and truncates to 31 characters.
| 引数 | デスクリプション |
|---|---|
name
|
Raw translated sheet name.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
Sanitised name, or "Sheet" if the result is empty.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_should_translate_sheet_names
¶
Checks whether sheet-name translation should be attempted.
| 引数 | デスクリプション |
|---|---|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
The detected backend identifier (unused).
タイプ:
|
config
|
Optional TranslationConfig; falls back to load_setting().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if sheet-name translation should proceed.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_sheet_names
¶
Extracts sheet names from a spreadsheet file.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the spreadsheet.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'sheetname:{name}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_sheet_names
¶
Injects translated sheet names back into the output spreadsheet.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the output file.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_xlsx_sheet_names
¶
Extracts sheet names from an XLSX file via ZIP+lxml.
Reads only xl/workbook.xml (a few KB) instead of loading the
full workbook through openpyxl, which would parse all cell data.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xlsx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, sheet_name) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_xlsx_sheet_names
¶
Injects translated sheet names into XLSX via ZIP+lxml.
Uses direct XML manipulation to avoid openpyxl's lossy round-trip (which would drop restored embedded objects).
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xlsx file to modify in place.
タイプ:
|
translations
|
Mapping of 'sheetname:{name}' to translated name.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_ods_sheet_names
¶
Extracts sheet names from an ODS file via odfpy.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ods file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, sheet_name) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_ods_sheet_names
¶
Injects translated sheet names into ODS via ZIP+lxml.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .ods file to modify in place.
タイプ:
|
translations
|
Mapping of 'sheetname:{name}' to translated name.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_excel_sheet_names
¶
Extracts sheet names from an XLS file via win32com.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xls file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, sheet_name) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_excel_sheet_names
¶
Injects translated sheet names into an XLS file via win32com.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xls file to modify in place.
タイプ:
|
translations
|
Mapping of 'sheetname:{name}' to translated name.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_calc_sheet_names
¶
Extracts sheet names from an XLS/ODS file via UNO.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the spreadsheet.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, sheet_name) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_calc_sheet_names
¶
Injects translated sheet names into a spreadsheet via UNO.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the spreadsheet to modify in place.
タイプ:
|
translations
|
Mapping of 'sheetname:{name}' to translated name.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_should_translate_notes
¶
Checks whether speaker-notes translation should be attempted.
| 引数 | デスクリプション |
|---|---|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
The detected backend identifier (unused).
タイプ:
|
config
|
Optional TranslationConfig; falls back to load_setting().
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if speaker-notes translation should proceed.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_notes
¶
Extracts speaker notes from a presentation file.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the presentation.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'note:{slide}:{para}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_notes
¶
Injects translated speaker notes back into the output presentation.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the output file.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_pptx_notes
¶
Extracts speaker notes from a PPTX file via python-pptx.
Paragraphs with mixed formatting or hyperlinks are encoded as inline HTML so the LLM can preserve them.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .pptx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_pptx_notes
¶
Injects translated speaker notes into a PPTX file via python-pptx.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .pptx file to modify in place.
タイプ:
|
translations
|
Mapping of 'note:{slide}:{para}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_odp_notes
¶
Extracts speaker notes from an ODP file via odfpy.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .odp file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_odp_notes
¶
Injects translated speaker notes into an ODP file via odfpy.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .odp file to modify in place.
タイプ:
|
translations
|
Mapping of 'note:{slide}:{para}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_ppt_notes
¶
Extracts speaker notes from a PPT file via win32com.
Iterates the notes page of each slide and extracts text from shapes that have text frames.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ppt file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_ppt_notes
¶
Injects translated speaker notes into a PPT file via win32com.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .ppt file to modify in place.
タイプ:
|
translations
|
Mapping of 'note:{slide}:{para}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_impress_notes
¶
Extracts speaker notes from a PPT/ODP file via UNO.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the presentation.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_impress_notes
¶
Injects translated speaker notes into a presentation via UNO.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the presentation to modify in place.
タイプ:
|
translations
|
Mapping of 'note:{slide}:{para}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_headers_footers
¶
Extracts headers and footers from a word-processing document.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the document.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'header:{section}:{type}:{para}' or 'footer:...'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_headers_footers
¶
Injects translated headers/footers back into the output document.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the output file.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_docx_hf_part
¶
Extracts text from a DOCX header/footer part's paragraphs.
| 引数 | デスクリプション |
|---|---|
paragraphs
|
List of python-docx Paragraph objects.
タイプ:
|
section_idx
|
Section index.
タイプ:
|
hf_type
|
Type identifier ('default', 'first', 'even').
タイプ:
|
prefix
|
'header' or 'footer'.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_docx_headers_footers
¶
Extracts headers and footers from a DOCX file via python-docx.
Extracts default, first-page, and even-page headers/footers from each section. Paragraphs with mixed formatting or hyperlinks are encoded as inline HTML.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 7131 7132 7133 7134 7135 7136 7137 7138 7139 7140 7141 7142 7143 7144 7145 7146 7147 7148 7149 7150 7151 7152 7153 7154 7155 7156 7157 7158 7159 7160 7161 7162 7163 7164 7165 7166 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 7197 7198 7199 7200 7201 7202 7203 7204 7205 7206 7207 7208 7209 | |
_inject_docx_headers_footers
¶
Injects translated headers/footers into a DOCX file via python-docx.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .docx file to modify in place.
タイプ:
|
translations
|
Mapping of 'header:...' / 'footer:...' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
7212 7213 7214 7215 7216 7217 7218 7219 7220 7221 7222 7223 7224 7225 7226 7227 7228 7229 7230 7231 7232 7233 7234 7235 7236 7237 7238 7239 7240 7241 7242 7243 7244 7245 7246 7247 7248 7249 7250 7251 7252 7253 7254 7255 7256 7257 7258 7259 7260 7261 7262 7263 7264 7265 7266 7267 7268 7269 7270 7271 7272 7273 7274 7275 7276 7277 7278 7279 7280 7281 7282 7283 7284 7285 7286 | |
_build_odf_hf_map
¶
Builds an ODF header/footer element-tag → (prefix, type) lookup.
Used by both _extract_odt_headers_footers and
_inject_odt_headers_footers to avoid duplicating the mapping.
ソースコード位置: src/core/office_processor.py
_extract_odt_headers_footers
¶
Extracts headers and footers from an ODT file via ZIP+lxml.
ODT stores headers/footers in styles.xml under
<style:master-page> elements.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .odt file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_odt_headers_footers
¶
Injects translated headers/footers into an ODT file via ZIP+lxml.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .odt file to modify in place.
タイプ:
|
translations
|
Mapping of 'header:...' / 'footer:...' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_word_headers_footers
¶
Extracts headers/footers from a DOC file via win32com.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_word_headers_footers
¶
Injects translated headers/footers into a DOC file via win32com.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .doc file to modify in place.
タイプ:
|
translations
|
Mapping of 'header:...' / 'footer:...' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_writer_headers_footers
¶
Extracts headers/footers from a DOC/ODT file via UNO.
UNO stores headers/footers on page styles. Each unique page style is treated as a "section" for key purposes.
Note
Only default headers/footers are extracted. UNO exposes
first-page (HeaderTextFirst) and even-page
(HeaderTextLeft) properties, but they require additional
page-style flags (HeaderIsShared / FirstIsShared) that
vary across LibreOffice versions. Default-only is sufficient
for the vast majority of DOC files.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the document.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
7508 7509 7510 7511 7512 7513 7514 7515 7516 7517 7518 7519 7520 7521 7522 7523 7524 7525 7526 7527 7528 7529 7530 7531 7532 7533 7534 7535 7536 7537 7538 7539 7540 7541 7542 7543 7544 7545 7546 7547 7548 7549 7550 7551 7552 7553 7554 7555 7556 7557 7558 7559 7560 7561 7562 7563 7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 7578 7579 | |
_inject_uno_writer_headers_footers
¶
Injects translated headers/footers into a document via UNO.
Only default headers/footers are handled. See
:func:_extract_uno_writer_headers_footers note on first/even-page
limitation.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the document to modify in place.
タイプ:
|
translations
|
Mapping of 'header:...' / 'footer:...' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_footnotes
¶
Extracts footnotes and endnotes from a word-processing document.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the document.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'footnote:{id}' or 'endnote:{id}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_footnotes
¶
Injects translated footnotes/endnotes into the output document.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the output file.
タイプ:
|
translations
|
Mapping of location_key to translated text.
タイプ:
|
suffix
|
Lowercase file extension.
タイプ:
|
backend
|
Backend identifier for legacy format dispatch.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_docx_fn_xml
¶
Extracts text from DOCX footnote or endnote XML.
| 引数 | デスクリプション |
|---|---|
xml_data
|
Raw XML bytes of footnotes.xml or endnotes.xml.
タイプ:
|
element_tag
|
Fully-qualified tag (e.g.
タイプ:
|
key_prefix
|
Key prefix ('footnote' or 'endnote').
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_docx_footnotes
¶
Extracts footnotes and endnotes from a DOCX file via ZIP+lxml.
Reads word/footnotes.xml and word/endnotes.xml. IDs 0, 1,
and -1 are internal separators and are skipped.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_docx_footnotes
¶
Injects translated footnotes/endnotes into a DOCX file via ZIP+lxml.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .docx file to modify in place.
タイプ:
|
translations
|
Mapping of 'footnote:{id}' / 'endnote:{id}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
7781 7782 7783 7784 7785 7786 7787 7788 7789 7790 7791 7792 7793 7794 7795 7796 7797 7798 7799 7800 7801 7802 7803 7804 7805 7806 7807 7808 7809 7810 7811 7812 7813 7814 7815 7816 7817 7818 7819 7820 7821 7822 7823 7824 7825 7826 7827 7828 7829 7830 7831 7832 7833 7834 7835 7836 7837 7838 7839 7840 7841 7842 7843 7844 7845 7846 7847 7848 | |
_inject_docx_fn_text
¶
Replaces text in DOCX footnote/endnote paragraphs.
Preserves the footnote-reference run (<w:footnoteRef/>) in the
first paragraph and replaces text in subsequent runs.
| 引数 | デスクリプション |
|---|---|
paras
|
List of
タイプ:
|
new_text
|
Translated text (paragraphs separated by newlines).
タイプ:
|
w_ns
|
WordprocessingML namespace URI.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_odt_footnotes
¶
Extracts footnotes and endnotes from an ODT file via ZIP+lxml.
ODT stores footnotes as <text:note> elements inline in
content.xml. The text:note-class attribute distinguishes
footnotes from endnotes.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .odt file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_odt_footnotes
¶
Injects translated footnotes/endnotes into an ODT file via ZIP+lxml.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .odt file to modify in place.
タイプ:
|
translations
|
Mapping of 'footnote:{id}' / 'endnote:{id}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_word_footnotes
¶
Extracts footnotes and endnotes from a DOC file via win32com.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_word_footnotes
¶
Injects translated footnotes/endnotes into a DOC file via win32com.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .doc file to modify in place.
タイプ:
|
translations
|
Mapping of 'footnote:{id}' / 'endnote:{id}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_writer_footnotes
¶
Extracts footnotes and endnotes from a DOC/ODT file via UNO.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the document.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_writer_footnotes
¶
Injects translated footnotes/endnotes into a document via UNO.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the document to modify in place.
タイプ:
|
translations
|
Mapping of 'footnote:{id}' / 'endnote:{id}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_word_shapes
¶
Extracts text from shapes/text boxes in a Word document via win32com.
When a shape's text range has mixed per-run formatting, inline HTML is
emitted via _win32com_range_runs_to_html so the LLM can preserve it.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'shape:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_word_shapes
¶
Injects translated text into Word shapes via win32com.
When the translated text contains inline HTML formatting tags,
per-segment formatting is applied via _inject_win32com_word_html_runs.
Otherwise, plain text is set with uniform font save/restore.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .doc file to modify in place.
タイプ:
|
translations
|
Mapping of 'shape:{index}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_win32com_excel_shapes
¶
Extracts text from shapes in an Excel workbook via win32com.
When a shape's text range has mixed per-run formatting, inline HTML is
emitted via _win32com_range_runs_to_html so the LLM can preserve it.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xls file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'shape:{sheet_name}:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_win32com_excel_html_runs
¶
Replaces an Excel shape's text with HTML-formatted segments via win32com.
Parses html_text via _parse_html_formatting, sets the full
plain text on the range, then applies per-segment formatting using
Characters(start, length) sub-ranges (1-based indexing).
| 引数 | デスクリプション |
|---|---|
text_rng
|
A win32com
タイプ:
|
html_text
|
Translated text with inline
タイプ:
|
original_text
|
The text before translation (for script detection).
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
8244 8245 8246 8247 8248 8249 8250 8251 8252 8253 8254 8255 8256 8257 8258 8259 8260 8261 8262 8263 8264 8265 8266 8267 8268 8269 8270 8271 8272 8273 8274 8275 8276 8277 8278 8279 8280 8281 8282 8283 8284 8285 8286 8287 8288 8289 8290 8291 8292 8293 8294 8295 8296 8297 8298 8299 8300 8301 8302 8303 8304 8305 8306 8307 8308 8309 8310 8311 8312 8313 8314 8315 8316 8317 8318 8319 8320 8321 8322 8323 8324 8325 8326 8327 8328 8329 8330 8331 8332 8333 8334 | |
_inject_win32com_excel_shapes
¶
Injects translated text into Excel shapes via win32com.
When the translated text contains inline HTML formatting tags,
per-segment formatting is applied via
_inject_win32com_excel_html_runs. Otherwise, plain text is set
with uniform font save/restore.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xls file to modify in place.
タイプ:
|
translations
|
Mapping of 'shape:{sheet_name}:{index}' to translated text.
タイプ:
|
target_lang
|
Target language name for font substitution.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_writer_shapes
¶
Extracts text from shapes/text boxes in a Writer document via UNO.
When any paragraph within a shape has mixed per-run formatting, the
entire shape is extracted as inline HTML via _uno_runs_to_html
(paragraphs joined by newlines). Otherwise, plain text is returned.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .doc or .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'shape:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_writer_shapes
¶
Injects translated text into Writer shapes via UNO.
When the translated text contains inline HTML formatting tags,
dispatches to _inject_uno_para_text for per-run formatting on
each paragraph (lines separated by newlines). Otherwise, uses plain
setString with shape-level property save/restore.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .doc or .docx file to modify in place.
タイプ:
|
translations
|
Mapping of 'shape:{index}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_uno_calc_shapes
¶
Extracts text from shapes in a Calc spreadsheet via UNO.
When any paragraph within a shape has mixed per-run formatting, the
entire shape is extracted as inline HTML via _uno_runs_to_html
(paragraphs joined by newlines). Otherwise, plain text is returned.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xls file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'shape:{sheet_name}:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_uno_calc_shapes
¶
Injects translated text into Calc shapes via UNO.
When the translated text contains inline HTML formatting tags,
dispatches to _inject_uno_para_text for per-run formatting on
each paragraph (lines separated by newlines). Otherwise, uses plain
setString with shape-level property save/restore.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xls file to modify in place.
タイプ:
|
translations
|
Mapping of 'shape:{sheet_name}:{index}' to translated text.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_read_txbx_data
¶
Reads plain text and <w:t> elements from a single <wps:txbx>.
Iterates paragraph-by-paragraph to preserve structural newlines between paragraphs.
| 引数 | デスクリプション |
|---|---|
txbx_el
|
An lxml element for a
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
Tuple of (plain_text, t_elements) where plain_text is the stripped |
list[object]
|
concatenated text of all paragraphs joined by |
tuple[str, list[object]]
|
t_elements is the flat list of all |
ソースコード位置: src/core/office_processor.py
_wps_txbx_to_text_or_html
¶
Extracts text from a <wps:txbx> element, using HTML when formatting varies.
Iterates direct children of each <w:p> paragraph — both <w:r>
runs and <w:hyperlink> wrappers. If run formatting varies or any
hyperlinks are present, wraps the text in inline HTML tags via
_wrap_with_tags and <a href="..."> tags. Otherwise returns
plain text identical to _read_txbx_data. Paragraphs are joined
with '\n'.
Character-style references (<w:rStyle>) are resolved when
char_styles is provided: the style supplies base formatting and
direct <w:rPr> attributes override.
All <w:t> elements within a single run are concatenated so that
split runs (e.g. from spell-checking) do not silently drop text.
| 引数 | デスクリプション |
|---|---|
txbx_el
|
An lxml element for a
タイプ:
|
char_styles
|
Mapping of style IDs to formatting tuples, as returned
by
タイプ:
|
hyperlink_rels
|
Mapping of relationship IDs to target URLs,
parsed from the part's
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
str
|
Plain text or inline-HTML string representing the text box content. |
ソースコード位置: src/core/office_processor.py
8631 8632 8633 8634 8635 8636 8637 8638 8639 8640 8641 8642 8643 8644 8645 8646 8647 8648 8649 8650 8651 8652 8653 8654 8655 8656 8657 8658 8659 8660 8661 8662 8663 8664 8665 8666 8667 8668 8669 8670 8671 8672 8673 8674 8675 8676 8677 8678 8679 8680 8681 8682 8683 8684 8685 8686 8687 8688 8689 8690 8691 8692 8693 8694 8695 8696 8697 8698 8699 8700 8701 8702 8703 8704 8705 8706 8707 8708 8709 8710 8711 8712 8713 8714 8715 8716 8717 8718 8719 8720 8721 8722 8723 8724 8725 8726 8727 8728 8729 8730 8731 8732 8733 8734 8735 8736 8737 8738 8739 8740 8741 8742 8743 8744 8745 8746 8747 8748 8749 8750 8751 8752 8753 8754 8755 8756 8757 8758 8759 8760 8761 8762 8763 8764 8765 8766 8767 8768 8769 8770 8771 8772 8773 8774 8775 8776 8777 8778 8779 8780 8781 8782 8783 8784 8785 8786 8787 8788 8789 8790 8791 8792 8793 8794 8795 8796 8797 8798 8799 8800 8801 8802 8803 8804 8805 8806 8807 8808 8809 8810 8811 8812 8813 8814 8815 8816 8817 | |
_inject_wps_txbx_plain
¶
Injects plain text into a <wps:txbx> element in-place.
Sets the first <w:t> element's text to the first line and appends
<w:br/> and new <w:t> elements for subsequent lines. Remaining
original <w:t> elements are cleared.
| 引数 | デスクリプション |
|---|---|
txbx_el
|
An lxml element for the
タイプ:
|
plain_text
|
The translated plain text (lines separated by
タイプ:
|
t_elements
|
Flat list of all
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_wps_txbx_html_runs
¶
Injects HTML-formatted text into a <wps:txbx> element in-place.
Parses html_text via _parse_html_formatting to obtain
_FormattedSegment objects. Segments containing '\n' are split
across multiple <w:p> elements. Existing run children are cleared
and replaced with new <w:r>/<w:rPr>/<w:t> elements. Excess
paragraphs are removed; new ones are cloned from the last existing
paragraph when more are needed.
When rels_adder is provided, segments with hyperlink_url are
wrapped in <w:hyperlink> elements with the relationship ID
returned by the callback.
| 引数 | デスクリプション |
|---|---|
txbx_el
|
An lxml element for the
タイプ:
|
html_text
|
Translated HTML string with inline formatting tags.
タイプ:
|
rels_adder
|
Callback that accepts a URL string and returns a
relationship ID (
タイプ:
|
ソースコード位置: src/core/office_processor.py
8854 8855 8856 8857 8858 8859 8860 8861 8862 8863 8864 8865 8866 8867 8868 8869 8870 8871 8872 8873 8874 8875 8876 8877 8878 8879 8880 8881 8882 8883 8884 8885 8886 8887 8888 8889 8890 8891 8892 8893 8894 8895 8896 8897 8898 8899 8900 8901 8902 8903 8904 8905 8906 8907 8908 8909 8910 8911 8912 8913 8914 8915 8916 8917 8918 8919 8920 8921 8922 8923 8924 8925 8926 8927 8928 8929 8930 8931 8932 8933 8934 8935 8936 8937 8938 8939 8940 8941 8942 8943 8944 8945 8946 8947 8948 8949 8950 8951 8952 8953 8954 8955 8956 8957 8958 8959 8960 8961 8962 8963 8964 8965 8966 8967 8968 8969 8970 8971 8972 8973 8974 8975 8976 8977 8978 8979 8980 8981 8982 8983 8984 8985 8986 8987 8988 8989 8990 8991 8992 8993 8994 8995 8996 8997 8998 8999 9000 9001 9002 9003 9004 9005 9006 9007 9008 9009 9010 9011 9012 9013 9014 9015 9016 9017 9018 9019 9020 9021 9022 9023 9024 9025 9026 9027 9028 9029 9030 9031 9032 9033 9034 9035 9036 9037 9038 9039 9040 9041 9042 9043 9044 9045 9046 9047 9048 9049 9050 9051 9052 9053 9054 9055 9056 9057 9058 9059 9060 9061 9062 9063 9064 | |
_collect_wps_texts
¶
Finds all <wps:txbx> text boxes and their <w:t> elements.
Delegates per-element data reading to _read_txbx_data.
| 引数 | デスクリプション |
|---|---|
root
|
lxml root element of an XML part.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
Pairs of (concatenated_text, list_of_wt_elements).
タイプ:
|
ソースコード位置: src/core/office_processor.py
_extract_docx_shapes
¶
Extracts text from shapes/text boxes in a DOCX file via ZIP + lxml.
Parses word/document.xml and word/header*.xml / word/footer*.xml
looking for <wps:txbx> elements that contain <w:t> runs.
When run formatting varies or hyperlinks are present within a text box,
inline HTML is emitted so the LLM can preserve it.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .docx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'shape:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_docx_shapes
¶
Injects translated text into DOCX shapes/text boxes via ZIP + lxml.
When the translated text contains inline HTML formatting tags,
_inject_wps_txbx_html_runs is used to rebuild <w:r> elements
with per-segment <w:rPr> formatting. When <a href="...">
tags are present, hyperlink relationships are added to the part's
.rels file. Otherwise, plain text is injected via
_inject_wps_txbx_plain.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .docx file to modify in place.
タイプ:
|
translations
|
Mapping of
タイプ:
|
ソースコード位置: src/core/office_processor.py
9146 9147 9148 9149 9150 9151 9152 9153 9154 9155 9156 9157 9158 9159 9160 9161 9162 9163 9164 9165 9166 9167 9168 9169 9170 9171 9172 9173 9174 9175 9176 9177 9178 9179 9180 9181 9182 9183 9184 9185 9186 9187 9188 9189 9190 9191 9192 9193 9194 9195 9196 9197 9198 9199 9200 9201 9202 9203 9204 9205 9206 9207 9208 9209 9210 9211 9212 9213 9214 9215 9216 9217 9218 9219 9220 9221 9222 9223 9224 9225 9226 9227 9228 9229 9230 9231 9232 9233 9234 9235 9236 | |
_resolve_xlsx_sheet_drawings
¶
Resolves sheet-name → drawing-path mappings from an XLSX ZIP.
Reads xl/workbook.xml to get sheet names and
xl/worksheets/_rels/sheet{N}.xml.rels to find associated drawings.
| 引数 | デスクリプション |
|---|---|
zf
|
An open
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
Pairs of (sheet_name, drawing_xml_path) like
タイプ:
|
ソースコード位置: src/core/office_processor.py
9245 9246 9247 9248 9249 9250 9251 9252 9253 9254 9255 9256 9257 9258 9259 9260 9261 9262 9263 9264 9265 9266 9267 9268 9269 9270 9271 9272 9273 9274 9275 9276 9277 9278 9279 9280 9281 9282 9283 9284 9285 9286 9287 9288 9289 9290 9291 9292 9293 9294 9295 9296 9297 9298 9299 9300 9301 9302 9303 9304 9305 9306 9307 | |
_extract_xlsx_shapes
¶
Extracts text from shapes in an XLSX file via ZIP + lxml.
Uses DrawingML <a:txBody> elements within each sheet's drawing XML.
When run formatting varies or hyperlinks are present within a shape,
inline HTML is emitted via _drawingml_to_html so the LLM can
preserve it.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .xlsx file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like 'shape:{sheet_name}:{index}'.
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_xlsx_shapes
¶
Injects translated text into XLSX shapes via ZIP + lxml.
When translated text contains <a href="..."> tags, hyperlink
relationships are added to the drawing's .rels file.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .xlsx file to modify in place.
タイプ:
|
translations
|
Mapping of
タイプ:
|
ソースコード位置: src/core/office_processor.py
9360 9361 9362 9363 9364 9365 9366 9367 9368 9369 9370 9371 9372 9373 9374 9375 9376 9377 9378 9379 9380 9381 9382 9383 9384 9385 9386 9387 9388 9389 9390 9391 9392 9393 9394 9395 9396 9397 9398 9399 9400 9401 9402 9403 9404 9405 9406 9407 9408 9409 9410 9411 9412 9413 9414 9415 9416 9417 9418 9419 9420 9421 9422 9423 9424 9425 9426 9427 9428 9429 9430 9431 9432 9433 | |
_build_odf_style_map
¶
Builds a mapping of style names to <style:style> elements.
Scans <office:automatic-styles> for <style:style> entries
with style:family="text" and returns a dict keyed by
style:name.
| 引数 | デスクリプション |
|---|---|
root
|
The lxml root element of an ODF
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
dict
|
Mapping of style name to the
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_odf_text_box_html_runs
¶
Injects HTML-formatted text into an ODF <draw:text-box> element.
Parses html_text via _parse_html_formatting. For each unique
formatting signature, generates a <style:style> entry in
auto_styles_el and wraps the text in <text:span> with the
corresponding text:style-name. Handles '\n' by creating
multiple <text:p> elements.
Falls back to _inject_odf_paragraph_text when no HTML tags are
detected.
| 引数 | デスクリプション |
|---|---|
text_box_el
|
An lxml element for
タイプ:
|
html_text
|
Translated text with inline formatting tags.
タイプ:
|
text_p_tag
|
The fully-qualified
タイプ:
|
auto_styles_el
|
The
タイプ:
|
style_counter
|
Mutable
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True if the element was modified. |
ソースコード位置: src/core/office_processor.py
9473 9474 9475 9476 9477 9478 9479 9480 9481 9482 9483 9484 9485 9486 9487 9488 9489 9490 9491 9492 9493 9494 9495 9496 9497 9498 9499 9500 9501 9502 9503 9504 9505 9506 9507 9508 9509 9510 9511 9512 9513 9514 9515 9516 9517 9518 9519 9520 9521 9522 9523 9524 9525 9526 9527 9528 9529 9530 9531 9532 9533 9534 9535 9536 9537 9538 9539 9540 9541 9542 9543 9544 9545 9546 9547 9548 9549 9550 9551 9552 9553 9554 9555 9556 9557 9558 9559 9560 9561 9562 9563 9564 9565 9566 9567 9568 9569 9570 9571 9572 9573 9574 9575 9576 9577 9578 9579 9580 9581 9582 9583 9584 9585 9586 9587 9588 9589 9590 9591 9592 9593 9594 9595 9596 9597 9598 9599 9600 9601 9602 9603 9604 9605 9606 9607 9608 9609 9610 9611 9612 9613 9614 9615 9616 9617 9618 9619 9620 9621 9622 9623 9624 9625 9626 9627 9628 9629 9630 9631 9632 9633 9634 9635 9636 9637 9638 9639 9640 9641 9642 9643 9644 9645 9646 9647 9648 9649 9650 9651 9652 9653 9654 9655 9656 9657 9658 9659 9660 9661 9662 9663 9664 9665 | |
_extract_odt_shapes
¶
Extracts text from <draw:text-box> elements in an ODT file.
When span formatting varies within a text box, inline HTML is emitted
via _odf_text_box_to_html so the LLM can preserve it.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .odt file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_odt_shapes
¶
Injects translated text into <draw:text-box> elements in an ODT.
When the translated text contains inline HTML formatting tags,
_inject_odf_text_box_html_runs is used to create styled spans.
Otherwise, falls back to _inject_odf_paragraph_text.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .odt file to modify in place.
タイプ:
|
translations
|
Mapping of
タイプ:
|
ソースコード位置: src/core/office_processor.py
9702 9703 9704 9705 9706 9707 9708 9709 9710 9711 9712 9713 9714 9715 9716 9717 9718 9719 9720 9721 9722 9723 9724 9725 9726 9727 9728 9729 9730 9731 9732 9733 9734 9735 9736 9737 9738 9739 9740 9741 9742 9743 9744 9745 9746 9747 9748 9749 9750 9751 9752 9753 9754 9755 9756 9757 9758 9759 9760 9761 9762 9763 9764 9765 | |
_extract_ods_shapes
¶
Extracts text from <draw:text-box> elements in an ODS file.
Iterates per <table:table> to produce sheet-qualified keys.
When span formatting varies within a text box, inline HTML is emitted
via _odf_text_box_to_html so the LLM can preserve it.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the .ods file.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
list
|
(location_key, text) pairs with keys like
タイプ:
|
ソースコード位置: src/core/office_processor.py
_inject_ods_shapes
¶
Injects translated text into <draw:text-box> elements in an ODS.
When the translated text contains inline HTML formatting tags,
_inject_odf_text_box_html_runs is used to create styled spans.
Otherwise, falls back to _inject_odf_paragraph_text.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the .ods file to modify in place.
タイプ:
|
translations
|
Mapping of
タイプ:
|
ソースコード位置: src/core/office_processor.py
9814 9815 9816 9817 9818 9819 9820 9821 9822 9823 9824 9825 9826 9827 9828 9829 9830 9831 9832 9833 9834 9835 9836 9837 9838 9839 9840 9841 9842 9843 9844 9845 9846 9847 9848 9849 9850 9851 9852 9853 9854 9855 9856 9857 9858 9859 9860 9861 9862 9863 9864 9865 9866 9867 9868 9869 9870 9871 9872 9873 9874 9875 9876 9877 9878 9879 9880 9881 9882 | |
_translate_single_image
¶
_translate_single_image(
image_bytes,
content_type,
target_lang,
src_lang,
glossary_entries,
ocr_method,
*,
provider=None,
model=None,
)
Translates a single image using the OCR → LLM → render pipeline.
Writes the image to a temp file, processes it, and returns the translated image bytes. Returns None if the image has no translatable text or rendering fails. Does not catch ValueError so that fatal LLM errors can propagate to the caller.
| 引数 | デスクリプション |
|---|---|
image_bytes
|
Raw image data.
タイプ:
|
content_type
|
MIME type (e.g. "image/png").
タイプ:
|
target_lang
|
Target language name.
タイプ:
|
src_lang
|
Source language name.
タイプ:
|
glossary_entries
|
Optional glossary entries.
タイプ:
|
ocr_method
|
OCR method name (e.g. "TesseractOCR").
タイプ:
|
provider
|
Optional LLM provider override.
タイプ:
|
model
|
Optional LLM model override.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bytes | None
|
bytes | None: Translated image bytes, or None. |
ソースコード位置: src/core/office_processor.py
9885 9886 9887 9888 9889 9890 9891 9892 9893 9894 9895 9896 9897 9898 9899 9900 9901 9902 9903 9904 9905 9906 9907 9908 9909 9910 9911 9912 9913 9914 9915 9916 9917 9918 9919 9920 9921 9922 9923 9924 9925 9926 9927 9928 9929 9930 9931 9932 9933 9934 9935 9936 9937 9938 9939 9940 9941 9942 9943 9944 9945 9946 9947 9948 9949 9950 9951 9952 9953 9954 9955 9956 9957 9958 9959 9960 9961 9962 9963 9964 9965 9966 9967 9968 9969 9970 9971 9972 | |
_translate_zip_images
¶
_translate_zip_images(
output_path,
suffix,
target_lang,
src_lang,
glossary_entries,
ocr_method,
progress_callback,
cancel_check,
*,
provider=None,
model=None,
checkpoint_dir=None,
)
Translates images embedded in an Office document using zipfile.
Opens the document as a ZIP archive, identifies raster images in the
known media directory, translates each via the OCR → LLM → render
pipeline, replaces the originals in memory, and rewrites the archive
atomically (write to .tmp, then shutil.move).
Supports .docx, .xlsx, .pptx, .odt, .ods, .odp,
and .epub.
Skip-with-warning policy for non-fatal per-image errors: a
bad image (e.g. IMAGE_TOO_LARGE, an unreadable JPEG header,
a vision model returning empty text) leaves the original image in
place and the loop continues. The user gets a document with
most images translated and the broken ones in their source form,
rather than one stubborn image blocking the whole document.
Fatal LLM errors (AUTH_ERROR, QUOTA_ERROR, VISION_NOT_SUPPORTED)
still break out immediately — those indicate the entire pipeline
can't continue, not "this one image won't translate".
When checkpoint_dir is provided, each image's translated bytes
are persisted under <checkpoint_dir>/office_images/<sha256>.bin
and consulted on re-runs. This means an interrupted batch (50/100
images done, then a quota error or cancellation) only retries the
remaining 50 on resume instead of redoing the whole document. The
SHA256 of the source bytes is the cache key, so duplicate images
(e.g. a company logo repeated on every page) deduplicate naturally.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the saved translated document (modified in place).
タイプ:
|
suffix
|
Lowercase file extension (e.g. ".docx").
タイプ:
|
target_lang
|
Target language name.
タイプ:
|
src_lang
|
Source language name.
タイプ:
|
glossary_entries
|
Optional glossary entries.
タイプ:
|
ocr_method
|
OCR method name (e.g. "TesseractOCR").
タイプ:
|
progress_callback
|
Called with 0-100 for the image phase.
タイプ:
|
cancel_check
|
Returns True if the task was cancelled.
タイプ:
|
provider
|
Optional LLM provider override.
タイプ:
|
model
|
Optional LLM model override.
タイプ:
|
checkpoint_dir
|
Task storage directory for per-image cache.
タイプ:
|
ソースコード位置: src/core/office_processor.py
9975 9976 9977 9978 9979 9980 9981 9982 9983 9984 9985 9986 9987 9988 9989 9990 9991 9992 9993 9994 9995 9996 9997 9998 9999 10000 10001 10002 10003 10004 10005 10006 10007 10008 10009 10010 10011 10012 10013 10014 10015 10016 10017 10018 10019 10020 10021 10022 10023 10024 10025 10026 10027 10028 10029 10030 10031 10032 10033 10034 10035 10036 10037 10038 10039 10040 10041 10042 10043 10044 10045 10046 10047 10048 10049 10050 10051 10052 10053 10054 10055 10056 10057 10058 10059 10060 10061 10062 10063 10064 10065 10066 10067 10068 10069 10070 10071 10072 10073 10074 10075 10076 10077 10078 10079 10080 10081 10082 10083 10084 10085 10086 10087 10088 10089 10090 10091 10092 10093 10094 10095 10096 10097 10098 10099 10100 10101 10102 10103 10104 10105 10106 10107 10108 10109 10110 10111 10112 10113 10114 10115 10116 10117 10118 10119 10120 10121 10122 10123 10124 10125 10126 10127 10128 10129 10130 10131 10132 10133 10134 10135 10136 10137 10138 10139 10140 10141 10142 10143 10144 10145 10146 10147 10148 10149 10150 10151 10152 10153 10154 10155 10156 10157 10158 10159 10160 10161 10162 10163 10164 10165 10166 10167 10168 10169 10170 | |
_translate_legacy_images
¶
_translate_legacy_images(
output_path,
suffix,
backend,
target_lang,
src_lang,
glossary_entries,
ocr_method,
progress_callback,
cancel_check,
*,
provider=None,
model=None,
checkpoint_dir=None,
)
Translates images in legacy office files via round-trip conversion.
Converts the legacy file (.doc/.xls/.ppt) to its modern equivalent (.docx/.xlsx/.pptx), runs the existing ZIP-based image pipeline on the modern file, then converts back to the legacy format.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the saved legacy document (modified in place).
タイプ:
|
suffix
|
Lowercase legacy extension (e.g. ".doc").
タイプ:
|
backend
|
Backend identifier ("win32com" or "uno").
タイプ:
|
target_lang
|
Target language name.
タイプ:
|
src_lang
|
Source language name.
タイプ:
|
glossary_entries
|
Optional glossary entries.
タイプ:
|
ocr_method
|
OCR method name (e.g. "TesseractOCR").
タイプ:
|
progress_callback
|
Called with 0-100 for the image phase.
タイプ:
|
cancel_check
|
Returns True if the task was cancelled.
タイプ:
|
provider
|
Optional LLM provider override.
タイプ:
|
model
|
Optional LLM model override.
タイプ:
|
checkpoint_dir
|
Task storage directory for per-image cache.
Forwarded to
タイプ:
|
ソースコード位置: src/core/office_processor.py
10173 10174 10175 10176 10177 10178 10179 10180 10181 10182 10183 10184 10185 10186 10187 10188 10189 10190 10191 10192 10193 10194 10195 10196 10197 10198 10199 10200 10201 10202 10203 10204 10205 10206 10207 10208 10209 10210 10211 10212 10213 10214 10215 10216 10217 10218 10219 10220 10221 10222 10223 10224 10225 10226 10227 10228 10229 10230 10231 10232 10233 10234 10235 10236 10237 10238 10239 10240 10241 10242 10243 10244 10245 10246 10247 10248 10249 10250 10251 10252 10253 | |
_translate_doc_images
¶
_translate_doc_images(
output_path,
suffix,
backend,
target_lang,
src_lang,
glossary_entries,
progress_callback,
cancel_check,
config=None,
*,
provider=None,
model=None,
checkpoint_dir=None,
)
Translates images embedded in an Office document.
For modern/ODF formats: uses the ZIP-based image pipeline directly. For legacy formats (.doc/.xls/.ppt): converts to modern format first, runs the ZIP pipeline, then converts back.
| 引数 | デスクリプション |
|---|---|
output_path
|
Path to the saved translated document.
タイプ:
|
suffix
|
Lowercase file extension (e.g. ".docx", ".doc").
タイプ:
|
backend
|
Backend identifier for legacy format conversion.
タイプ:
|
target_lang
|
Target language name.
タイプ:
|
src_lang
|
Source language name.
タイプ:
|
glossary_entries
|
Optional glossary entries.
タイプ:
|
progress_callback
|
Called with 0-100 for the image phase.
タイプ:
|
cancel_check
|
Returns True if the task was cancelled.
タイプ:
|
config
|
Optional TranslationConfig snapshot; falls back to load_setting().
タイプ:
|
provider
|
Optional LLM provider override.
タイプ:
|
model
|
Optional LLM model override.
タイプ:
|
checkpoint_dir
|
Task storage directory for per-image cache.
Forwarded to the underlying ZIP pipeline;
タイプ:
|
ソースコード位置: src/core/office_processor.py
10256 10257 10258 10259 10260 10261 10262 10263 10264 10265 10266 10267 10268 10269 10270 10271 10272 10273 10274 10275 10276 10277 10278 10279 10280 10281 10282 10283 10284 10285 10286 10287 10288 10289 10290 10291 10292 10293 10294 10295 10296 10297 10298 10299 10300 10301 10302 10303 10304 10305 10306 10307 10308 10309 10310 10311 10312 10313 10314 10315 10316 10317 10318 10319 10320 10321 10322 10323 10324 10325 10326 10327 10328 10329 10330 | |
process_office_file
¶
process_office_file(
file_path,
output_path,
target_lang,
src_lang="",
progress_callback=None,
glossary_entries=None,
cancel_check=None,
checkpoint_dir=None,
config=None,
*,
provider=None,
model=None,
)
Translates an Office document using the best available backend.
Extracts translatable text, translates via LLM, and injects translations back into a copy of the document.
| 引数 | デスクリプション |
|---|---|
file_path
|
Path to the source office file.
タイプ:
|
output_path
|
Path to write the translated file.
タイプ:
|
target_lang
|
Target language name.
タイプ:
|
src_lang
|
Source language name.
タイプ:
|
progress_callback
|
Called with 0-100 progress percentage.
タイプ:
|
glossary_entries
|
Optional glossary entries for translation.
タイプ:
|
cancel_check
|
Returns True if the task was cancelled.
タイプ:
|
checkpoint_dir
|
Directory for saving/loading checkpoints.
タイプ:
|
config
|
Optional TranslationConfig for dependency injection.
タイプ:
|
provider
|
Optional LLM provider override (Gemini / Custom).
タイプ:
|
model
|
Optional LLM model override.
タイプ:
|
| 戻り値 | デスクリプション |
|---|---|
bool
|
True on success, False if cancelled.
タイプ:
|
| 発生 | デスクリプション |
|---|---|
ValueError
|
On backend or processing errors. |
ソースコード位置: src/core/office_processor.py
10338 10339 10340 10341 10342 10343 10344 10345 10346 10347 10348 10349 10350 10351 10352 10353 10354 10355 10356 10357 10358 10359 10360 10361 10362 10363 10364 10365 10366 10367 10368 10369 10370 10371 10372 10373 10374 10375 10376 10377 10378 10379 10380 10381 10382 10383 10384 10385 10386 10387 10388 10389 10390 10391 10392 10393 10394 10395 10396 10397 10398 10399 10400 10401 10402 10403 10404 10405 10406 10407 10408 10409 10410 10411 10412 10413 10414 10415 10416 10417 10418 10419 10420 10421 10422 10423 10424 10425 10426 10427 10428 10429 10430 10431 10432 10433 10434 10435 10436 10437 10438 10439 10440 10441 10442 10443 10444 10445 10446 10447 10448 10449 10450 10451 10452 10453 10454 10455 10456 10457 10458 10459 10460 10461 10462 10463 10464 10465 10466 10467 10468 10469 10470 10471 10472 10473 10474 10475 10476 10477 10478 10479 10480 10481 10482 10483 10484 10485 10486 10487 10488 10489 10490 10491 10492 10493 10494 10495 10496 10497 10498 10499 10500 10501 10502 10503 10504 10505 10506 10507 10508 10509 10510 10511 10512 10513 10514 10515 10516 10517 10518 10519 10520 10521 10522 10523 10524 10525 10526 10527 10528 10529 10530 10531 10532 10533 10534 10535 10536 10537 10538 10539 10540 10541 10542 10543 10544 10545 10546 10547 10548 10549 10550 10551 10552 10553 10554 10555 10556 10557 10558 10559 10560 10561 10562 10563 10564 10565 10566 10567 10568 10569 10570 10571 10572 10573 10574 10575 10576 10577 10578 10579 10580 10581 10582 10583 10584 10585 10586 10587 10588 10589 10590 10591 10592 10593 10594 10595 10596 10597 10598 10599 10600 10601 10602 10603 10604 10605 10606 10607 10608 10609 10610 10611 10612 10613 10614 10615 10616 10617 10618 10619 10620 10621 10622 10623 10624 10625 10626 10627 10628 10629 10630 10631 10632 10633 10634 10635 10636 10637 10638 10639 10640 10641 10642 10643 10644 10645 10646 10647 10648 10649 10650 10651 10652 10653 10654 10655 10656 10657 10658 10659 10660 10661 10662 10663 10664 10665 10666 10667 10668 10669 10670 10671 10672 10673 10674 10675 10676 10677 10678 10679 10680 10681 10682 10683 10684 10685 10686 10687 10688 10689 10690 10691 10692 10693 10694 10695 10696 10697 10698 10699 10700 10701 10702 10703 10704 10705 10706 10707 10708 | |