[Tool Recommendation] Local Deployment of Translation Service — MTranServer

Official Introduction

A lightweight offline translation model server with ultra-low resource consumption and extremely fast speed—no GPU required. Average response time per request is just 50 milliseconds. Supports translation across major world languages.

Note: This model server focuses on the design goals of offline translation, response speed, cross-platform deployment, and local execution to achieve unlimited free translation. However, due to limitations in model size and optimization level, translation quality is naturally not as good as that of large online models. For high-quality translations, we recommend using large online model APIs.

v4 optimizes memory usage, further improves speed, and enhances stability—official release coming soon! The dev version is not recommended for upgrade use!

Usage Instructions

Download the latest version for your platform from Releases, then launch the program directly from the command line.

MTranServer is primarily designed for server environments, so currently only command-line service and Docker deployment are supported.

I may continue developing MTranDesktop in my spare time for desktop use. Contributions from developers are welcome.

After starting the server, logs will display the address of a built-in simple UI and the online debugging documentation. Preview below:

Command Line Arguments

./mtranserver [Options]

Options:
  -version, -v          Show version information
  -log-level string     Log level (debug, info, warn, error) (default "warn")
  -config-dir string    Configuration directory (default "~/.config/mtran/server")
  -model-dir string     Model directory (default "~/.config/mtran/models")
  -host string          Server listening address (default "0.0.0.0")
  -port string          Server port (default "8989")
  -ui                   Enable Web UI (default true)
  -offline              Enable offline mode, disable automatic model downloads (default false)
  -worker-idle-timeout int  Worker idle timeout (seconds) (default 300)

Examples:
  ./mtranserver --host 127.0.0.1 --port 8080
  ./mtranserver --ui --offline
  ./mtranserver -v

Docker Compose Deployment

Create an empty directory and write a compose.yml file with the following content:

services:
  mtranserver:
    image: xxnuo/mtranserver:latest
    container_name: mtranserver
    restart: unless-stopped
    ports:
      - "8989:8989"
    environment:
      - MT_HOST=0.0.0.0
      - MT_PORT=8989
      - MT_ENABLE_UI=true
      - MT_OFFLINE=false
      # - MT_API_TOKEN=your_secret_token_here
    volumes:
      - ./models:/app/models
docker pull xxnuo/mtranserver:latest
docker compose up -d

Important Notice:

When translating a new language pair for the first time, the server will automatically download the corresponding translation model (unless offline mode is enabled). This process may take some time depending on your network speed and model size. After downloading, it takes a few seconds to initialize the engine. Subsequent translation requests will enjoy millisecond-level response times. It’s recommended to test one translation before formal use to pre-download and load models.

The program updates frequently; if you encounter issues, try updating to the latest version.

Compatible Plugin Interfaces

The server provides multiple compatible interfaces for translation plugins:

Interface Method Description Supported Plugins
/imme POST Immersive Translate plugin interface Immersive Translate
/kiss POST Kiss Translator plugin interface Kiss Translator
/deepl POST DeepL API v2 compatible interface Clients supporting DeepL API
/google/language/translate/v2 POST Google Translate API v2 compatible Clients supporting Google Translate API
/google/translate_a/single GET Google translate_a/single compatible Clients supporting Google web translation
/hcfy POST Highlight & Translate compatible Highlight & Translate

Plugin Configuration Guide:

Notes:

  • For Immersive Translate, go to the Settings page, enable Beta features under Developer Mode, and you’ll see Custom API Settings under Translation Services (Official Tutorial). Then increase the Max Requests Per Second setting to fully utilize server performance for lightning-fast experience. I set mine to 512 requests per second and 1 paragraph per request. Adjust based on your server specs.

  • For Kiss Translator, go to Settings, scroll down to Interface Settings, and find the custom Custom option. Similarly, adjust Max Concurrent Requests and Request Interval Time to maximize performance. I set Max Concurrent Requests to 100 and Request Interval Time to 1 ms. Tune according to your setup.

Then configure the plugin’s custom endpoint URL according to the table below.

Name URL Plugin Setting
Immersive (no password) http://localhost:8989/imme Custom API Settings - API URL
Immersive (with password) http://localhost:8989/imme?token=your_token Same as above; replace your_token with your MT_API_TOKEN value
Kiss (no password) http://localhost:8989/kiss Interface Settings - Custom - URL
Kiss (with password) http://localhost:8989/kiss Same as above; enter your_token in the KEY field
DeepL Compatible http://localhost:8989/deepl Use DeepL-Auth-Key or Bearer authentication
Google Compatible http://localhost:8989/google/language/translate/v2 Use key parameter or Bearer authentication
Highlight & Translate http://localhost:8989/hcfy Supports token parameter or Bearer authentication

Regular users can simply follow the table above to set the plugin endpoint URLs and start using the service.

Similar Projects

Listed below are similar projects. Users with different needs may consider trying them:

Project Name Memory Usage Concurrency Performance Translation Quality Speed Additional Info
NLLB Very High Poor Fair Slow Ported to Android by experts in RTranslator with many optimizations, but still high usage and slow speed
LibreTranslate Very High Moderate Fair Medium Mid-tier CPU handles ~3 sentences/sec, high-end CPUs handle 15–20 sentences/sec. Details
OPUS-MT High Moderate Slightly Poor Fast Performance Benchmarks
Other Large Models Extremely High Dynamic Excellent Very Slow High hardware requirements; for high-concurrency scenarios, consider using the vllm framework to manage concurrency via memory/GPU usage
This Project Low High Fair Extremely Fast Average response time ~50ms per request; v4 optimizes memory usage—stay tuned for official release!

The comparison above is based on simple CPU-only tests for English-to-Chinese translation. Not strictly benchmarked nor quantitatively compared—provided for reference only.

Advanced Configuration Guide

Please refer to the API.md file and the API documentation generated after startup.

Star History

Star History Chart

Thanks

Bergamot Project for the awesome idea of local translation.

Mozilla for providing the models.