Official Introduction
A lightweight offline translation model server with ultra-low resource consumption and extremely fast speed—no GPU required. Average response time per request is just 50 milliseconds. Supports translation across major world languages.
Note: This model server focuses on the design goals of offline translation, response speed, cross-platform deployment, and local execution to achieve unlimited free translation. However, due to limitations in model size and optimization level, translation quality is naturally not as good as that of large online models. For high-quality translations, we recommend using large online model APIs.
v4 optimizes memory usage, further improves speed, and enhances stability—official release coming soon! The dev version is not recommended for upgrade use!
Usage Instructions
Download the latest version for your platform from Releases, then launch the program directly from the command line.
MTranServer is primarily designed for server environments, so currently only command-line service and Docker deployment are supported.
I may continue developing MTranDesktop in my spare time for desktop use. Contributions from developers are welcome.
After starting the server, logs will display the address of a built-in simple UI and the online debugging documentation. Preview below:
Command Line Arguments
./mtranserver [Options]
Options:
-version, -v Show version information
-log-level string Log level (debug, info, warn, error) (default "warn")
-config-dir string Configuration directory (default "~/.config/mtran/server")
-model-dir string Model directory (default "~/.config/mtran/models")
-host string Server listening address (default "0.0.0.0")
-port string Server port (default "8989")
-ui Enable Web UI (default true)
-offline Enable offline mode, disable automatic model downloads (default false)
-worker-idle-timeout int Worker idle timeout (seconds) (default 300)
Examples:
./mtranserver --host 127.0.0.1 --port 8080
./mtranserver --ui --offline
./mtranserver -v
Docker Compose Deployment
Create an empty directory and write a compose.yml file with the following content:
services:
mtranserver:
image: xxnuo/mtranserver:latest
container_name: mtranserver
restart: unless-stopped
ports:
- "8989:8989"
environment:
- MT_HOST=0.0.0.0
- MT_PORT=8989
- MT_ENABLE_UI=true
- MT_OFFLINE=false
# - MT_API_TOKEN=your_secret_token_here
volumes:
- ./models:/app/models
docker pull xxnuo/mtranserver:latest
docker compose up -d
Important Notice:
When translating a new language pair for the first time, the server will automatically download the corresponding translation model (unless offline mode is enabled). This process may take some time depending on your network speed and model size. After downloading, it takes a few seconds to initialize the engine. Subsequent translation requests will enjoy millisecond-level response times. It’s recommended to test one translation before formal use to pre-download and load models.
The program updates frequently; if you encounter issues, try updating to the latest version.
Compatible Plugin Interfaces
The server provides multiple compatible interfaces for translation plugins:
| Interface | Method | Description | Supported Plugins |
|---|---|---|---|
/imme |
POST | Immersive Translate plugin interface | Immersive Translate |
/kiss |
POST | Kiss Translator plugin interface | Kiss Translator |
/deepl |
POST | DeepL API v2 compatible interface | Clients supporting DeepL API |
/google/language/translate/v2 |
POST | Google Translate API v2 compatible | Clients supporting Google Translate API |
/google/translate_a/single |
GET | Google translate_a/single compatible | Clients supporting Google web translation |
/hcfy |
POST | Highlight & Translate compatible | Highlight & Translate |
Plugin Configuration Guide:
Notes:
For Immersive Translate, go to the
Settingspage, enableBetafeatures under Developer Mode, and you’ll seeCustom API Settingsunder Translation Services (Official Tutorial). Then increase theMax Requests Per Secondsetting to fully utilize server performance for lightning-fast experience. I set mine to512requests per second and1paragraph per request. Adjust based on your server specs.For Kiss Translator, go to
Settings, scroll down to Interface Settings, and find the customCustomoption. Similarly, adjustMax Concurrent RequestsandRequest Interval Timeto maximize performance. I setMax Concurrent Requeststo100andRequest Interval Timeto1ms. Tune according to your setup.Then configure the plugin’s custom endpoint URL according to the table below.
| Name | URL | Plugin Setting |
|---|---|---|
| Immersive (no password) | http://localhost:8989/imme |
Custom API Settings - API URL |
| Immersive (with password) | http://localhost:8989/imme?token=your_token |
Same as above; replace your_token with your MT_API_TOKEN value |
| Kiss (no password) | http://localhost:8989/kiss |
Interface Settings - Custom - URL |
| Kiss (with password) | http://localhost:8989/kiss |
Same as above; enter your_token in the KEY field |
| DeepL Compatible | http://localhost:8989/deepl |
Use DeepL-Auth-Key or Bearer authentication |
| Google Compatible | http://localhost:8989/google/language/translate/v2 |
Use key parameter or Bearer authentication |
| Highlight & Translate | http://localhost:8989/hcfy |
Supports token parameter or Bearer authentication |
Regular users can simply follow the table above to set the plugin endpoint URLs and start using the service.
Similar Projects
Listed below are similar projects. Users with different needs may consider trying them:
| Project Name | Memory Usage | Concurrency Performance | Translation Quality | Speed | Additional Info |
|---|---|---|---|---|---|
| NLLB | Very High | Poor | Fair | Slow | Ported to Android by experts in RTranslator with many optimizations, but still high usage and slow speed |
| LibreTranslate | Very High | Moderate | Fair | Medium | Mid-tier CPU handles ~3 sentences/sec, high-end CPUs handle 15–20 sentences/sec. Details |
| OPUS-MT | High | Moderate | Slightly Poor | Fast | Performance Benchmarks |
| Other Large Models | Extremely High | Dynamic | Excellent | Very Slow | High hardware requirements; for high-concurrency scenarios, consider using the vllm framework to manage concurrency via memory/GPU usage |
| This Project | Low | High | Fair | Extremely Fast | Average response time ~50ms per request; v4 optimizes memory usage—stay tuned for official release! |
The comparison above is based on simple CPU-only tests for English-to-Chinese translation. Not strictly benchmarked nor quantitatively compared—provided for reference only.
Advanced Configuration Guide
Please refer to the API.md file and the API documentation generated after startup.
Star History
Thanks
Bergamot Project for the awesome idea of local translation.