Skip to content

OCR Engines

OCR is used to read text out of images — both on the Extract Text page and as a fallback inside Document translation when a page is scanned (no text layer) or when you turn on Translate embedded images.

You can pick from three OCR engines.

Free, fast, offline. Needs a system install.

brew install tesseract tesseract-lang
sudo apt install tesseract-ocr tesseract-ocr-all

tesseract-ocr-all brings every supported language. To save disk, install only what you need (e.g. tesseract-ocr-fra for French).

sudo dnf install tesseract tesseract-langpack-eng tesseract-langpack-fra

Download the installer from UB Mannheim's Tesseract releases. Run it, accept the defaults — language packs are bundled.

Verify:

tesseract --version
tesseract --list-langs

In the desktop app: Settings → OCR → OCR method = Tesseract. Done.

EasyOCR

Free, offline. Great for non-Latin scripts (Chinese, Korean, Japanese, Thai). Models download on first use (~1 GB total).

uv sync --extra easyocr

In the desktop app: Settings → OCR → OCR method = EasyOCR.

The first time you use it for a language, the relevant model downloads to ~/.EasyOCR/. Subsequent runs are instant.

Google Cloud Vision

Cloud, paid (1,000 free requests / month). Highest accuracy, especially on noisy / handwritten / mixed-script content.

  1. Create a Google Cloud project
  2. Enable the Vision API
  3. Create an API key
  4. In the desktop app: Settings → Service → Google Cloud API key → paste
  5. Settings → OCR → OCR method = Google Cloud OCR

The same Google Cloud API key powers Vision OCR, Speech-to-Text, and Text-to-Speech if you also enable those APIs.

Comparing accuracy

The Settings → OCR tab has a small comparison table built in — language coverage, online/offline, cost, accuracy. Re-read it any time you're tempted to switch.

When OCR is used

Place Behaviour
Extract Text page (when method = OCR) Direct OCR on the dropped images
Translate Document → PDF OCR fallback on scan-only pages (no text layer)
Translate Document → Office with Translate embedded images on OCR + LLM vision on every embedded image

Tips

Pick the source language

Most OCR engines are much more accurate when you tell them what language to expect. The Subtitle / Document / Extract Text pages all forward your Source language picker to the OCR engine.

Tesseract is enough for clean printed text

Don't reach for cloud OCR until Tesseract / EasyOCR has actually failed on your content. They're free, fast, and surprisingly good.