MCP Server (ait-mcp)¶
mcp_server
¶
MCP server exposing AI Translate capabilities to LLM agents.
Provides text/document translation, image text extraction, audio transcription, speech synthesis, glossary queries, and language listing as MCP tools that any compatible client (Claude Desktop, Claude Code, etc.) can invoke.
Usage::
ait-mcp # stdio transport (default)
ait-mcp --transport sse # SSE transport for web clients
_bootstrap
¶
Initializes app directories, logging, and the database once.
Source code in src/mcp_server.py
_validate_language
¶
Validates and resolves a language label case-insensitively.
| PARAMETER | DESCRIPTION |
|---|---|
label
|
Language name provided by the caller.
TYPE:
|
param_name
|
Parameter name for the error message (e.g. "target language", "source language").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The canonical language label from AVAILABLE_LANGUAGES. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the label does not match any known language. |
Source code in src/mcp_server.py
_validate_source_language
¶
Validates an optional source language label.
Returns empty string for auto-detect, or the canonical label.
Source code in src/mcp_server.py
_require_llm
¶
Raises RuntimeError if the LLM backend is not configured.
Source code in src/mcp_server.py
_resolve_content_type
¶
Maps a user-facing content_type string to the internal constant.
Unknown labels fall back to plain text.
Source code in src/mcp_server.py
translate_text
¶
Translate a list of text strings into the target language.
| PARAMETER | DESCRIPTION |
|---|---|
texts
|
One or more strings to translate.
TYPE:
|
target_language
|
Target language name (e.g. "French", "Vietnamese"). Use list_languages to see all supported values.
TYPE:
|
source_language
|
Source language name, or empty string for auto-detection (default).
TYPE:
|
content_type
|
Hint about the text format — one of "plain_text", "html", "subtitle", "markdown", "xml", "rtf", "json", "localization". Helps the LLM preserve formatting tags.
TYPE:
|
model
|
LLM model to use (e.g. "Gemini:gemini-3-flash-preview"). Defaults to the last model selected in the desktop app.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Translated strings in the same order as the input. |
Source code in src/mcp_server.py
extract_image_text
¶
Extract text from an image using OCR or the configured LLM vision model.
Tries the LLM vision provider first (Gemini / custom endpoint). Falls back to OCR when LLM isn't configured OR when LLM returns empty/whitespace text. LLM errors (auth, quota, network) propagate as-is rather than silently falling back — otherwise misconfiguration would be invisible to the caller.
| PARAMETER | DESCRIPTION |
|---|---|
image_path
|
Absolute path to an image file (PNG, JPG, BMP, WEBP, TIFF).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
A dict with: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
| RAISES | DESCRIPTION |
|---|---|
RuntimeError
|
if neither LLM nor OCR is configured. |
ValueError
|
if the image format is unsupported. |
FileNotFoundError
|
if the image path doesn't exist. |
Source code in src/mcp_server.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
list_languages
¶
List all supported languages for translation.
| RETURNS | DESCRIPTION |
|---|---|
list[dict[str, str]]
|
A list of dicts, each with: |
list[dict[str, str]]
|
|
list[dict[str, str]]
|
|
list[dict[str, str]]
|
|
Source code in src/mcp_server.py
_run_pipeline_background
¶
Runs the translation pipeline and cleans up tracking state.
Called as the target of a daemon thread started by
translate_document. Catches all exceptions so the thread
never crashes silently.
| PARAMETER | DESCRIPTION |
|---|---|
task_ids
|
Task IDs owned by this pipeline invocation.
TYPE:
|
config
|
TranslationConfig to drive the pipeline.
TYPE:
|
cancel_event
|
Signalled by
TYPE:
|
Source code in src/mcp_server.py
translate_document
¶
translate_document(
file_paths,
target_language,
source_language="",
output_directory="",
translate_images=False,
translate_comments=False,
translate_shapes=False,
translate_notes=False,
translate_sheet_names=False,
model="",
ocr_method="",
)
Translate one or more files asynchronously.
Queues translation tasks and starts the pipeline in the background. Use get_task_status to poll for progress and results, and cancel_task to stop a running batch cooperatively.
| PARAMETER | DESCRIPTION |
|---|---|
file_paths
|
Absolute paths to files to translate. Supported formats include images (.png, .jpg), documents (.docx, .pdf, .pptx), text (.txt, .md, .html, .epub), subtitles (.srt), and localization files (.po, .xliff, .yaml).
TYPE:
|
target_language
|
Target language name (e.g. "French", "Vietnamese"). Use list_languages to see all supported values.
TYPE:
|
source_language
|
Source language name, or empty string for auto-detection (default).
TYPE:
|
output_directory
|
Directory for translated output files. Defaults to the same directory as each source file.
TYPE:
|
translate_images
|
Translate embedded images in Office/PDF documents using OCR (requires OCR to be configured).
TYPE:
|
translate_comments
|
Translate comments in Office documents.
TYPE:
|
translate_shapes
|
Translate shapes and text boxes in documents.
TYPE:
|
translate_notes
|
Translate speaker notes in PowerPoint files.
TYPE:
|
translate_sheet_names
|
Translate sheet names in Excel files.
TYPE:
|
model
|
LLM model to use (e.g. "Gemini:gemini-3-flash-preview"). Defaults to the last model selected in the desktop app.
TYPE:
|
ocr_method
|
OCR engine for translate_images. One of "TesseractOCR" (default), "EasyOCR", or "Google Cloud OCR". Friendly spellings like "tesseract" / "easyocr" / "google cloud" are accepted.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
A dict with: |
dict[str, Any]
|
|
dict[str, Any]
|
|
Source code in src/mcp_server.py
395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 | |
get_task_status
¶
Get the current status and progress of translation tasks.
Use this to poll tasks created by translate_document.
| PARAMETER | DESCRIPTION |
|---|---|
task_ids
|
List of task IDs returned by translate_document.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
list[dict[str, Any]]
|
A list of dicts (one per task ID), each with: |
list[dict[str, Any]]
|
|
list[dict[str, Any]]
|
|
list[dict[str, Any]]
|
|
list[dict[str, Any]]
|
|
list[dict[str, Any]]
|
|
list[dict[str, Any]]
|
|
list[dict[str, Any]]
|
|
list[dict[str, Any]]
|
|
Source code in src/mcp_server.py
cancel_task
¶
Request cancellation of translation tasks started by translate_document.
Cancellation is cooperative: the pipeline checks the flag between tasks and between LLM batches, so an in-flight LLM call completes before the pipeline exits. Unknown task IDs are ignored — no error is raised so callers can safely over-request.
| PARAMETER | DESCRIPTION |
|---|---|
task_ids
|
Task IDs returned by translate_document.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
A dict with: |
dict[str, Any]
|
|
dict[str, Any]
|
|
Source code in src/mcp_server.py
transcribe_audio
¶
Transcribe an audio or video file to SRT subtitle text.
| PARAMETER | DESCRIPTION |
|---|---|
file_path
|
Absolute path to an audio or video file. Audio: .mp3, .wav, .m4a, .flac, .ogg, .aac, .wma. Video: .mp4, .webm, .mkv, .avi, .mov, .wmv.
TYPE:
|
source_language
|
Source language name (e.g. "French"), or empty string for auto-detection (default).
TYPE:
|
stt_method
|
Speech-to-text engine — "Whisper" (local, default) or "Google Cloud".
TYPE:
|
model_size
|
Whisper model size — "tiny", "base" (default), "small", "medium", or "large". Ignored for Google Cloud.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, str]
|
A dict with: |
dict[str, str]
|
|
dict[str, str]
|
|
Source code in src/mcp_server.py
synthesize_speech
¶
synthesize_speech(
text,
target_language,
output_path="",
voice_gender="FEMALE",
tts_method="Edge TTS",
audio_format=".mp3",
)
Convert text to speech audio.
| PARAMETER | DESCRIPTION |
|---|---|
text
|
The text to synthesize into speech.
TYPE:
|
target_language
|
Language for the voice (e.g. "French", "Vietnamese"). Use list_languages to see all supported values.
TYPE:
|
output_path
|
Absolute path for the output audio file. If empty, a
temp file is created under the system temp directory with a
TYPE:
|
voice_gender
|
Voice gender — "MALE" or "FEMALE" (default).
TYPE:
|
tts_method
|
TTS engine — "Edge TTS" (free, default), "Google Cloud TTS", "ElevenLabs", "Gemini TTS", or "Piper TTS" (offline; requires the per-language voice to be downloaded first via the desktop app's Settings → Voice → Piper panel, otherwise raises PIPER_VOICE_NOT_INSTALLED).
TYPE:
|
audio_format
|
Output format — ".mp3" (default) or ".wav". The leading dot is optional; any other value raises ValueError.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, str]
|
A dict with: |
dict[str, str]
|
|
dict[str, str]
|
|
Source code in src/mcp_server.py
753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 | |
query_glossary
¶
Query glossary sets and their translation term pairs.
Glossaries enforce consistent terminology during translation. When no set_id is given, returns all matching glossary sets with their entry counts. When set_id is given, returns that set's entries.
| PARAMETER | DESCRIPTION |
|---|---|
set_id
|
If provided, return entries for this specific glossary set. If omitted, return all glossary sets.
TYPE:
|
active_only
|
When listing sets (no set_id), only return active sets if True (default). Ignored when set_id is provided.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
A dict with either: |
dict[str, Any]
|
|
dict[str, Any]
|
|
dict[str, Any]
|
|
Source code in src/mcp_server.py
main
¶
CLI entry point for the MCP server.