MCP Server (`ait-mcp`)¶

Exposes the translation pipeline as Model Context Protocol tools so AI agents like Claude Desktop and Claude Code can drive it directly — "Translate this PDF into French" becomes a tool call, not a copy-paste.

What gets exposed¶

Nine tools:

Tool	Purpose
`translate_text`	Translate a list of strings
`translate_document`	Queue file translation tasks (returns task IDs)
`get_task_status`	Poll task status
`cancel_task`	Cooperative cancel of in-flight tasks
`extract_image_text`	OCR or LLM vision
`transcribe_audio`	Audio → SRT
`synthesize_speech`	Text → MP3 / WAV
`query_glossary`	List glossary sets / entries
`list_languages`	All 45 supported languages

Run the server¶

ait-mcp                       # stdio transport (default for desktop agents)
ait-mcp --transport sse       # Server-Sent Events for web clients
ait-mcp --transport sse --port 9000

stdio is what every MCP client expects unless you've wired up SSE explicitly.

Add to Claude Desktop¶

Open Claude Desktop → Settings → Developer → Edit Config

Add this entry under mcpServers:

{
  "mcpServers": {
    "ai-translate": {
      "command": "uv",
      "args": ["run", "--project", "/absolute/path/to/ai-translate", "ait-mcp"]
    }
  }
}

Replace /absolute/path/to/ai-translate with the cloned repo path.

Quit and re-open Claude Desktop. The hammer icon should now show "ai-translate" with all 9 tools. Try:

"Translate this PDF (/home/me/report.pdf) into French — save the output next to the source."

Add to Claude Code¶

~/.config/claude-code/mcp_servers.json (or claude mcp add from inside Claude Code):

{
  "ai-translate": {
    "command": "uv",
    "args": ["run", "--project", "/absolute/path/to/ai-translate", "ait-mcp"]
  }
}

Restart Claude Code. The same 9 tools become callable.

Add to other MCP clients¶

Any MCP-compatible client takes a similar shape:

Command — uv run --project /path/to/ai-translate ait-mcp
Transport — stdio (default)

For HTTP / SSE-based clients, run ait-mcp --transport sse --port 9000 and point the client at http://localhost:9000.

Validation guarantees¶

Every tool returns the same shape on errors so agents can handle failures consistently:

Bad input	Tool response
Unknown language	`ValueError: Unknown … language '<input>'. Call list_languages to see supported values.`
LLM not configured	`RuntimeError: LLM is not configured. Run the desktop app and set up your API key…`
Unsupported file type	`ValueError` listing allowed extensions
Malformed `model="…"` (no `:`)	`ValueError` instead of silently using default
Unknown task IDs in `cancel_task`	Returned in the `unknown` array — no error
FFmpeg missing on `transcribe_audio`	`RuntimeError: FFmpeg is required…` (re-wrapped from the engine's bare `FFMPEG_NOT_FOUND` tag)

Agents calling these tools can rely on these contracts.

Concurrency¶

translate_document runs the pipeline in a daemon thread. Each batch gets its own cancel event, so cancelling one batch doesn't disturb another. The MCP server tracks active pipelines in a process-local map (cleaned up automatically when the pipeline finishes).

Use cases¶

"Translate this codebase's docs into Vietnamese" — point the agent at the docs folder, it batches translate_document calls and polls get_task_status until each one finishes.
"What languages do you support?" — agent calls list_languages, reads the response.
"Translate this Japanese receipt" — agent calls extract_image_text on the photo, then translate_text on the result.
"Generate Vietnamese subtitles for this Zoom recording" — agent calls transcribe_audio to get an SRT in the source language, then translate_text on each cue to localize, and reassembles the SRT.

Video dubbing isn't an MCP tool

The full STT → translate → TTS → mux pipeline (the desktop app's Dubbing page) is only available through the GUI right now. From MCP you can compose the equivalent yourself with transcribe_audio + translate_text + synthesize_speech, but you'll need to handle the timing-aware mux step (FFmpeg) outside.

Tips¶

Setup once, agents work everywhere

The MCP server shares API keys and settings with the desktop app and CLI. Configure your LLM / OCR / TTS once in the GUI, then any agent that talks to ait-mcp inherits the same setup.

Cold-start endpoint cache

For each (endpoint, model) pair, the chat-vs-responses-API choice and the working payload variant are persisted to llm_endpoint_cache.json in the OS cache directory (~/.cache/ai-translate/ on Linux, ~/Library/Caches/ai-translate/ on macOS, %LOCALAPPDATA%\ai-translate\cache\ on Windows). Fresh ait-mcp processes skip the auto-detection probe entirely after the first successful call — agents that spawn the server on demand don't pay the variant-detection round-trips on every invocation. The cache is multi-process and multi-thread safe (read-merge-write under RLock with atomic rename).

Per-tool model picker

The translate_text and translate_document tools accept an optional model parameter — agents can pick a fast model for quick turns and a heavier one for production output without needing the user to reconfigure the desktop app.

Long-running pipelines

translate_document returns immediately with task IDs. The agent is expected to poll get_task_status until each task reaches Done or Failed. Don't wait synchronously inside the tool call; that risks the MCP client's timeout firing.

MCP Server (ait-mcp)¶