About multi-ocr-sdk
Forward-looking Summary
Have you tried the recently released PaddleOCR-VL-1.5? Its performance is truly impressive.
On January 29, 2026, we released PaddleOCR-VL-1.5. PaddleOCR-VL-1.5 not only significantly raised the benchmark on the evaluation dataset OmniDocBench v1.5 to an accuracy of 94.5%, but also innovatively supports irregular bounding-box localization—enabling outstanding performance across real-world scenarios such as document scanning, skewed or curved documents, screen captures, and complex lighting conditions. Additionally, the model newly supports seal recognition alongside text detection and recognition, consistently leading key metrics.
使用教程 - PaddleOCR 文档
If you haven’t tried it yet, you can experience it online: PaddleOCR - 文档解析与智能文字识别 | 支持API调用与MCP服务 - 飞桨星河社区
The image below is an official example—the recognition performance on such severely distorted images is excellent; standard images pose no challenge at all.
Accordingly, multi-ocr-sdk has rapidly (GUI) updated to support PaddleOCR-VL-1.5.
Usage has been greatly simplified—you only need to specify base_url, api_key, and the file path.
How to Use
First, install:
pip install multi-ocr-sdk
Then use it:
import json
from multi_ocr_sdk import PaddleOCRVLClient
base_url = "http://10.131.101.39:8010"
api_key = "test"
# Default mode: returns only recognized text in Markdown format
client = PaddleOCRVLClient(base_url=base_url, api_key=api_key)
markdown_text = client.parse(r"examples/example_files/DeepSeek_OCR_paper_page1.jpg")
print(markdown_text)
# # Rich-result mode: returns both Markdown and per-page layout information (bounding-box coordinates)
# rich_client = PaddleOCRVLClient(
# base_url=base_url,
# api_key=api_key,
# return_layout_info=True,
# )
# result = rich_client.parse(r"examples/example_files/DeepSeek_OCR_paper_page1.jpg")
# result_dict = result.to_dict()
# print(json.dumps(result_dict, ensure_ascii=False, indent=2))
Finally
PRs are warmly welcome!
