[Tool Recommendation] Local Deployment of Translation Service — MTranServer

doggie · January 4, 2026, 3:36am

Official Introduction

A lightweight offline translation model server with ultra-low resource consumption and extremely fast speed—no GPU required. Average response time per request is just 50 milliseconds. Supports translation across major world languages.

Note: This model server focuses on the design goals of offline translation, response speed, cross-platform deployment, and local execution to achieve unlimited free translation. However, due to limitations in model size and optimization level, translation quality is naturally not as good as that of large online models. For high-quality translations, we recommend using large online model APIs.

v4 optimizes memory usage, further improves speed, and enhances stability—official release coming soon! The dev version is not recommended for upgrade use!

Usage Instructions

Download the latest version for your platform from Releases, then launch the program directly from the command line.

MTranServer is primarily designed for server environments, so currently only command-line service and Docker deployment are supported.

I may continue developing MTranDesktop in my spare time for desktop use. Contributions from developers are welcome.

After starting the server, logs will display the address of a built-in simple UI and the online debugging documentation. Preview below:

Command Line Arguments

./mtranserver [Options]

Options:
  -version, -v          Show version information
  -log-level string     Log level (debug, info, warn, error) (default "warn")
  -config-dir string    Configuration directory (default "~/.config/mtran/server")
  -model-dir string     Model directory (default "~/.config/mtran/models")
  -host string          Server listening address (default "0.0.0.0")
  -port string          Server port (default "8989")
  -ui                   Enable Web UI (default true)
  -offline              Enable offline mode, disable automatic model downloads (default false)
  -worker-idle-timeout int  Worker idle timeout (seconds) (default 300)

Examples:
  ./mtranserver --host 127.0.0.1 --port 8080
  ./mtranserver --ui --offline
  ./mtranserver -v

Docker Compose Deployment

Create an empty directory and write a compose.yml file with the following content:

services:
  mtranserver:
    image: xxnuo/mtranserver:latest
    container_name: mtranserver
    restart: unless-stopped
    ports:
      - "8989:8989"
    environment:
      - MT_HOST=0.0.0.0
      - MT_PORT=8989
      - MT_ENABLE_UI=true
      - MT_OFFLINE=false
      # - MT_API_TOKEN=your_secret_token_here
    volumes:
      - ./models:/app/models

docker pull xxnuo/mtranserver:latest
docker compose up -d

Important Notice:

When translating a new language pair for the first time, the server will automatically download the corresponding translation model (unless offline mode is enabled). This process may take some time depending on your network speed and model size. After downloading, it takes a few seconds to initialize the engine. Subsequent translation requests will enjoy millisecond-level response times. It’s recommended to test one translation before formal use to pre-download and load models.

The program updates frequently; if you encounter issues, try updating to the latest version.

Compatible Plugin Interfaces

The server provides multiple compatible interfaces for translation plugins:

Interface	Method	Description	Supported Plugins
`/imme`	POST	Immersive Translate plugin interface	Immersive Translate
`/kiss`	POST	Kiss Translator plugin interface	Kiss Translator
`/deepl`	POST	DeepL API v2 compatible interface	Clients supporting DeepL API
`/google/language/translate/v2`	POST	Google Translate API v2 compatible	Clients supporting Google Translate API
`/google/translate_a/single`	GET	Google translate_a/single compatible	Clients supporting Google web translation
`/hcfy`	POST	Highlight & Translate compatible	Highlight & Translate

Plugin Configuration Guide:

Notes:

For Immersive Translate, go to the Settings page, enable Beta features under Developer Mode, and you’ll see Custom API Settings under Translation Services (Official Tutorial). Then increase the Max Requests Per Second setting to fully utilize server performance for lightning-fast experience. I set mine to 512 requests per second and 1 paragraph per request. Adjust based on your server specs.

For Kiss Translator, go to Settings, scroll down to Interface Settings, and find the custom Custom option. Similarly, adjust Max Concurrent Requests and Request Interval Time to maximize performance. I set Max Concurrent Requests to 100 and Request Interval Time to 1 ms. Tune according to your setup.

Then configure the plugin’s custom endpoint URL according to the table below.

Name	URL	Plugin Setting
Immersive (no password)	`http://localhost:8989/imme`	`Custom API Settings` - `API URL`
Immersive (with password)	`http://localhost:8989/imme?token=your_token`	Same as above; replace `your_token` with your `MT_API_TOKEN` value
Kiss (no password)	`http://localhost:8989/kiss`	`Interface Settings` - `Custom` - `URL`
Kiss (with password)	`http://localhost:8989/kiss`	Same as above; enter `your_token` in the `KEY` field
DeepL Compatible	`http://localhost:8989/deepl`	Use `DeepL-Auth-Key` or `Bearer` authentication
Google Compatible	`http://localhost:8989/google/language/translate/v2`	Use `key` parameter or `Bearer` authentication
Highlight & Translate	`http://localhost:8989/hcfy`	Supports `token` parameter or `Bearer` authentication

Regular users can simply follow the table above to set the plugin endpoint URLs and start using the service.

Similar Projects

Listed below are similar projects. Users with different needs may consider trying them:

Project Name	Memory Usage	Concurrency Performance	Translation Quality	Speed	Additional Info
NLLB	Very High	Poor	Fair	Slow	Ported to Android by experts in RTranslator with many optimizations, but still high usage and slow speed
LibreTranslate	Very High	Moderate	Fair	Medium	Mid-tier CPU handles ~3 sentences/sec, high-end CPUs handle 15–20 sentences/sec. Details
OPUS-MT	High	Moderate	Slightly Poor	Fast	Performance Benchmarks
Other Large Models	Extremely High	Dynamic	Excellent	Very Slow	High hardware requirements; for high-concurrency scenarios, consider using the vllm framework to manage concurrency via memory/GPU usage
This Project	Low	High	Fair	Extremely Fast	Average response time ~50ms per request; v4 optimizes memory usage—stay tuned for official release!

The comparison above is based on simple CPU-only tests for English-to-Chinese translation. Not strictly benchmarked nor quantitatively compared—provided for reference only.

Advanced Configuration Guide

Please refer to the API.md file and the API documentation generated after startup.

Star History

Thanks

Bergamot Project for the awesome idea of local translation.

Mozilla for providing the models.

Topic	Replies	Views
沉浸式翻译开源平替 🛠工具与编程浏览器插件 , 翻译 , 人工智能	33	August 15, 2025
【工具推荐】网页翻译插件：同时保留原文和译文 🛠工具与编程浏览器插件	14	July 20, 2025
免费的embedding服务 🛠工具与编程人工智能	14	September 12, 2025
【工具推荐】大模型api聚合平台——newapi 🛠工具与编程工具推荐	16	December 17, 2025
【工具推荐】视频or音频生成字幕并翻译工具推荐人工智能	23	May 9, 2025