Bỏ qua

languages

languages

Language-related constants for the AI Translate application.

iter_languages_sorted_for_ui

iter_languages_sorted_for_ui()

Returns LANGUAGES sorted by the currently-localized label.

Picker populate sites should iterate this rather than LANGUAGES directly so a Vietnamese user sees Vietnamese alphabetical order, a Japanese user sees gojūon order, etc. — matching how macOS / Windows present their localized language pickers.

Sort key runs through normalize_for_search so accented forms sort with their base letter rather than after Z — Vietnamese "Tiếng Đan Mạch" lands between D and E, French "Élève" with E, German "Über" with U, etc. Plain casefold would push every non-ASCII initial to the end of the list (the bug reported by the Vietnamese picker that prompted this change). Ties fall back to the English label so the order is deterministic even when two locales translate to identical text.

Source code in src/constants/languages.py
def iter_languages_sorted_for_ui() -> list[tuple[str, str, str, str]]:
    """Returns ``LANGUAGES`` sorted by the *currently-localized* label.

    Picker populate sites should iterate this rather than ``LANGUAGES``
    directly so a Vietnamese user sees Vietnamese alphabetical order,
    a Japanese user sees gojūon order, etc. — matching how macOS /
    Windows present their localized language pickers.

    Sort key runs through ``normalize_for_search`` so accented forms
    sort with their base letter rather than after Z — Vietnamese
    "Tiếng Đan Mạch" lands between D and E, French "Élève" with E,
    German "Über" with U, etc.  Plain ``casefold`` would push every
    non-ASCII initial to the end of the list (the bug reported by the
    Vietnamese picker that prompted this change).  Ties fall back to
    the English label so the order is deterministic even when two
    locales translate to identical text.
    """
    from src.utils.text_utils import normalize_for_search  # noqa: PLC0415

    return sorted(
        LANGUAGES,
        key=lambda entry: (
            normalize_for_search(format_language_picker_label(entry[1], entry[3])),
            entry[1],
        ),
    )

_language_i18n_key

_language_i18n_key(english_label)

English label → language.<key> i18n suffix.

Lower-cases and replaces non-alphanumeric runs with a single underscore, then strips trailing underscores. Examples:

  • "Japanese""japanese"
  • "Chinese (Simplified)""chinese_simplified"
  • "Portuguese (Brazil)""portuguese_brazil"
  • "English (UK)""english_uk"

Kept stable so the keys don't churn when display text is refined per locale; tests rely on this mapping.

Source code in src/constants/languages.py
def _language_i18n_key(english_label: str) -> str:
    """English label → ``language.<key>`` i18n suffix.

    Lower-cases and replaces non-alphanumeric runs with a single
    underscore, then strips trailing underscores.  Examples:

    - ``"Japanese"`` → ``"japanese"``
    - ``"Chinese (Simplified)"`` → ``"chinese_simplified"``
    - ``"Portuguese (Brazil)"`` → ``"portuguese_brazil"``
    - ``"English (UK)"`` → ``"english_uk"``

    Kept stable so the keys don't churn when display text is
    refined per locale; tests rely on this mapping.
    """
    out: list[str] = []
    for ch in english_label.lower():
        if ch.isalnum():
            out.append(ch)
        elif out and out[-1] != "_":
            out.append("_")
    return "".join(out).strip("_")

format_language_picker_label

format_language_picker_label(english_label, native_name)

Returns the display string for a language in UI pickers.

Looks up the per-locale translation under language.<key> (where <key> is :func:_language_i18n_key) and returns it when present. Falls back to "<native> (<English>)" when the translation is genuinely missing — insurance against future drift if a new language is added without updating every locale. The double-fallback to plain English when native == english stays in place to avoid the silly "English (UK) (English (UK))" repeat.

The DB and LLM prompts continue to use the bare English label; this helper only affects what the user sees in pickers.

Source code in src/constants/languages.py
def format_language_picker_label(english_label: str, native_name: str) -> str:
    """Returns the display string for a language in UI pickers.

    Looks up the per-locale translation under ``language.<key>``
    (where ``<key>`` is :func:`_language_i18n_key`) and returns it
    when present.  Falls back to ``"<native> (<English>)"`` when the
    translation is genuinely missing — insurance against future
    drift if a new language is added without updating every locale.
    The double-fallback to plain English when ``native == english``
    stays in place to avoid the silly ``"English (UK) (English (UK))"``
    repeat.

    The DB and LLM prompts continue to use the bare English label;
    this helper only affects what the user sees in pickers.
    """
    # Lazy import to avoid pulling i18n into modules that just want
    # the AVAILABLE_LANGUAGES list.
    from src.constants.i18n import tr  # noqa: PLC0415

    key = f"language.{_language_i18n_key(english_label)}"
    translated = tr(key)
    # ``tr`` falls back to the key itself on miss — distinguishable.
    if translated and translated != key:
        return translated
    if native_name == english_label:
        return english_label
    return f"{native_name} ({english_label})"

localized_language_label

localized_language_label(english_label)

Returns the user-facing display string for an English language label.

Convenience wrapper over :func:format_language_picker_label for callers that only have the canonical English label (e.g. history tables that read it from the DB) — looks the native name up from :data:LANGUAGES and delegates.

Empty input passes through unchanged so callers don't have to special-case the auto-detect placeholder. Unknown labels (legacy DB entries with typos, removed languages, etc.) fall back to the raw English label rather than raise — the history table would rather render "Klingon" than blank cell.

Source code in src/constants/languages.py
def localized_language_label(english_label: str) -> str:
    """Returns the user-facing display string for an English language label.

    Convenience wrapper over :func:`format_language_picker_label` for
    callers that only have the canonical English label (e.g. history
    tables that read it from the DB) — looks the native name up from
    :data:`LANGUAGES` and delegates.

    Empty input passes through unchanged so callers don't have to
    special-case the auto-detect placeholder.  Unknown labels (legacy
    DB entries with typos, removed languages, etc.) fall back to the
    raw English label rather than raise — the history table would
    rather render "Klingon" than blank cell.
    """
    if not english_label:
        return english_label
    native = _NATIVE_BY_ENGLISH.get(english_label, english_label)
    return format_language_picker_label(english_label, native)

get_locale_code

get_locale_code(label)

Returns the BCP-47 locale code for a language label.

Falls back to the lowercased label if not found.

PARAMETER DESCRIPTION
label

English language name (e.g. "Vietnamese").

TYPE: str

RETURNS DESCRIPTION
str

Locale code string (e.g. "vi").

Source code in src/constants/languages.py
def get_locale_code(label: str) -> str:
    """Returns the BCP-47 locale code for a language label.

    Falls back to the lowercased label if not found.

    Args:
        label: English language name (e.g. "Vietnamese").

    Returns:
        Locale code string (e.g. "vi").
    """
    return _LABEL_TO_LOCALE.get(label, label.lower())

is_rtl_language

is_rtl_language(label)

Returns True when label names a right-to-left language.

Empty / unknown labels return False — the natural default for Latin-script languages and the safe default for the auto-detect case where the source language isn't known yet.

Source code in src/constants/languages.py
def is_rtl_language(label: str) -> bool:
    """Returns True when *label* names a right-to-left language.

    Empty / unknown labels return False — the natural default for
    Latin-script languages and the safe default for the auto-detect
    case where the source language isn't known yet.
    """
    return label in RTL_LANGUAGES