Project Introduction
- Open source address: GitHub - B-Beginner/MULTI-OCR-SDK: A simple and efficient Python SDK for DeepSeek-OCR API
MULTI-OCR-SDK is a simple and efficient Python SDK for calling various OCR APIs (currently supports deepseek-OCR and Vision Language Models, VLM). It can convert documents (PDFs, images) into Markdown text with high accuracy and performance.
Usage
Installation
# Install via pip
pip install multi-ocr-sdk
# Or install via uv
uv add multi-ocr-sdk
Basic Usage of VLM
import os
from multi_ocr_sdk import VLMClient
API_KEY = "your_api_key_here"
BASE_URL = "http://your_url/v1/chat/completions"
file_path = "./examples/example_files/DeepSeek_OCR_paper_mini.pdf"
client = VLMClient(api_key=API_KEY, base_url=BASE_URL)
result = client.parse(
file_path=file_path,
prompt="You are an OCR bot. Recognize the content of the input file and output it in Markdown format. Retain formatting information such as charts and diagrams as much as possible. Do not comment or summarize the document—just output the result.",
model="Qwen3-VL-8B",
# timeout=100, # Optional parameter, default is 60s. For large files requiring long processing times by VLM, increase this value.
# dpi=60 # Optional parameter, default is 72. Lower DPI results in blurrier images, fewer input tokens, and lower recognition quality. Adjust to a suitable value.
# pages=[1,2] # Optional parameter. Not needed for single-page PDFs or images. By default, all pages are processed for multi-page PDFs. Use this to specify certain pages.
)
print(result)
Basic Usage of deepseek-ocr
from multi_ocr_sdk import DeepSeekOCR
client = DeepSeekOCR(
api_key="your_api_key",
base_url="https://api.siliconflow.cn/v1/chat/completions" # Or your provider endpoint
)
# Simple document
text = client.parse("invoice.pdf", mode="free_ocr")
# Complex table
text = client.parse("statement.pdf", mode="grounding")
# Custom DPI
text = client.parse("document.pdf", dpi=300)
Background Story
Recently, DeepSeek released their OCR model. I tried it out and found it excellent to use.
Later, I discovered someone on GitHub had developed a deepseek-ocr-sdk, which was very convenient to use. I submitted some feature requests and collaborated with the original author to enhance the project with additional functionalities.
During usage, I found that the free DeepSeek OCR service on SiliconFlow easily hits rate limits, and I wasn’t planning to pay for an upgrade. Switching to other third-party services didn’t yield satisfactory results (I tried several from LStation, but the experience was poor). So I wondered—could we support other OCR models, like Qwen-OCR?
After some effort, I eventually refactored the original deepseek-ocr-sdk codebase to support VLMs. In practice, Qwen3-VL-8B delivers excellent results.
More common OCR engines will be supported in the future. Bug reports and PRs are welcome! ![]()