multi-ocr-sdk: A pip package supporting multiple OCR engines

doggie · December 16, 2025, 10:16am

Project Introduction

Open-source repository: GitHub - B-Beginner/MULTI-OCR-SDK: A simple and efficient Python SDK for Multi OCR API

MULTI-OCR-SDK is a simple and efficient Python SDK for invoking various OCR APIs (currently supporting DeepSeek-OCR, Vision-Language Models (VLMs), and PaddleOCR). It converts documents (PDFs, images) into Markdown text with high accuracy and performance.

Usage

Installation

# Install via pip  
pip install multi-ocr-sdk  
# Or install via uv  
uv add multi-ocr-sdk

Basic Usage of VLM

import os  
from multi_ocr_sdk import VLMClient  

API_KEY = "your_api_key_here"  
BASE_URL = "http://your_url/v1/chat/completions"  
file_path = "./examples/example_files/DeepSeek_OCR_paper_mini.pdf"  

client = VLMClient(api_key=API_KEY, base_url=BASE_URL)  

result = client.parse(  
    file_path=file_path,  
    prompt="You are an OCR bot. Recognize the content of the input file and output it in Markdown format, preserving formatting information such as charts and tables as much as possible. Do not comment on or summarize the file content—output only the recognized content.",  
    model="Qwen3-VL-8B",  
    # timeout=100, # Optional parameter; default is 60 seconds. For large files requiring extended VLM processing time, increase this value accordingly.  
    # dpi=60  # Optional parameter; default is 72. Lower DPI yields blurrier images, reduces input token consumption, but degrades recognition quality—adjust to suit your needs.  
    # pages=[1,2] # Optional parameter. Not required for single images or single-page PDFs. For multi-page PDFs, all pages are processed by default; specify page numbers here to process only selected pages.  
)  
print(result)

Basic Usage of DeepSeek-OCR

from multi_ocr_sdk import DeepSeekOCR  

client = DeepSeekOCR(  
    api_key="your_api_key",  
    base_url="https://api.siliconflow.cn/v1/chat/completions"  # Or your provider's endpoint  
)  

# Simple document  
text = client.parse("invoice.pdf", mode="free_ocr")  

# Complex tables  
text = client.parse("statement.pdf", mode="grounding")  

# Custom DPI  
text = client.parse("document.pdf", dpi=300)

Background Story

Recently, DeepSeek released its OCR model. After trying it out, we found it works exceptionally well.

Later, we discovered that someone on GitHub had developed a DeepSeek-OCR-SDK, which was very convenient to use. Based on this, we proposed several feature requests and collaborated with the original author to enhance and extend the SDK’s functionality.

During usage, we noticed that SiliconFlow’s free ds-ocr service easily hits rate limits, and we were unwilling to pay for upgrades. Switching to other third-party services yielded subpar results (we tested several L-site providers, with disappointing experiences). Thus, we considered whether support for alternative OCR models—such as Qwen-OCR—could be added.

After considerable effort, we refactored the original DeepSeek-OCR-SDK and added support for VLMs. Real-world testing confirmed that Qwen3-VL-8B delivers excellent results.

Next, we plan to support additional widely-used OCR engines. Bug reports and pull requests are warmly welcomed!

Topic	Replies	Views
multi-ocr-sdk现已支持paddleocr-vl-1.5 🛠工具与编程 ocr , paddle	11	February 26, 2026
如何使用硅基流动提供的免费deepseek-ocr api 💻编程 ocr , 硅基流动 , deepseek	89	November 26, 2025
部署deepseek-ocr 💻编程 ocr , linux , ubuntu , deepseek	16	December 4, 2025
如何安装部署deepseek-ocr且通过vllm server提供访问 🤖人工智能 ocr , deepseek	5	January 12, 2026
deepseek-ocr和paddleocr-vl区别 🤖人工智能 ocr , deepseek , paddle	10	January 12, 2026