This article was transcribed by SimpRead. Original URL: news.qq.com
Yesterday, the DeepSeek OCR paper really went viral—opened X and it was everywhere.
Yesterday, the DeepSeek OCR paper really went viral—opened X and it was everywhere.
Friends told me that OCR has been one of the most important research directions for the DeepSeek team over the past six months, and they’ve invested a lot of effort into it.
DeepSeek-OCR might help solve the computational bottleneck LLMs face when processing long contexts.
It seems everyone’s been focusing on DeepSeek’s novel idea of contextual optical compression these days, while few have actually evaluated the real-world performance of DeepSeek-OCR.
Our company has a scanning product called Judao全能Scan, which we’ve been developing for five years now.
Over the years, we’ve relied heavily on Baidu’s open-source PaddleOCR.
Back in 2021, we conducted extensive technical research and tested several different OCR solutions—both cloud-based and open-source—and ultimately concluded that PaddleOCR offered the best cost-performance ratio.
PaddleOCR has maintained significant influence in the open-source community, amassing over 60k stars on GitHub.
In 2022, I attended a Baidu OCR seminar—the meeting room was packed. In earlier years, when large tech companies weren’t paying much attention to OCR, Baidu truly led the way.
For scanning products like ours, OCR is a core competitive advantage.
Now that DeepSeek OCR has been released—and it’s a small model with extremely low deployment costs—we decided to run a comprehensive test comparing its performance against our current solution.
Just as we were preparing test documents, our CTO discovered that Baidu had also recently launched a new version of PaddleOCR: PaddleOCR-VL.
This is a multimodal document parsing model with only 0.9B parameters—meaning it can run smoothly even on our laptops.
Here is the corresponding technical report:
https://arxiv.org/pdf/2510.14528
Baidu’s new model dominated the then-latest authoritative benchmark, OmniDocBench V1.5, achieving SOTA (state-of-the-art) performance across four core capabilities: text recognition, formula recognition, table understanding, and reading order.
And just a few days ago, it topped Hugging Face Trending globally for consecutive days.
We feel rather embarrassed—our team clearly missed this update. No wonder so many foreigners were asking in the comment sections yesterday whether it outperformed PaddleOCR.
Next, let’s take a look at our team’s comparative testing results between PaddleOCR-VL and DeepSeek-OCR.
Based on our experience, evaluating OCR systems requires covering complex scenarios involving both printed and handwritten text—such as mixed languages, pinyin tones, mathematical formulas, and cursive handwriting.
Over the years, we’ve accumulated a number of test cases, so running evaluations was quite fast.
You can try PaddleOCR-VL directly on Hugging Face:
Currently, however, there is no cloud demo available for DeepSeek-OCR. After checking, I found that none of the major cloud platforms offer one-click deployment options yet.
To try it yourself, you’ll need to deploy it locally—but GitHub provides very clear and simple instructions.
Let’s begin.
First, let’s examine vertical inscriptions. There are two main challenges here:
-
Traditional OCR systems are typically designed for horizontal text. When dealing with vertical layouts, the model must understand character arrangement, vertical adjacency between characters, and how paragraphs and lines are structured.
-
Traditional Chinese characters have complex structures, requiring higher precision and deeper understanding of glyph shapes. Similar-looking or structurally intricate characters are prone to misrecognition, especially under varying calligraphic styles.
Below is a rubbing of an ancient stele I found (image supports scrolling):
Here’s PaddleOCR-VL’s recognition result:
And here’s DeepSeek-OCR’s result:
Overall, neither Baidu nor DeepSeek achieved 100% recognition accuracy.
However, Baidu’s output is relatively more accurate. DeepSeek made a basic error—misreading the final character “夫” (fu)—a surprisingly low-level mistake.
My colleague calculated the error rates:
Baidu: 8.16%, DeepSeek: 18.37%.
Let’s move on—to handwritten math formulas. Below is the original image. To make it easier for you to reproduce, I’ll share all test images.
These are standard test cases our team uses every time we evaluate OCR APIs or open-source tools (image supports scrolling):
Here’s PaddleOCR-VL’s result:
And here’s DeepSeek-OCR’s output:
DeepSeek outputs LaTeX format. After comprehensive comparison, both models achieved identical accuracy—both correct.
Now let’s test another vertical text layout (image supports scrolling), this time with blurry handwriting.
Baidu PaddleOCR-VL’s result:
DeepSeek-OCR’s result:
Both achieved 100% accuracy. Seems like for simpler cases, both models perform excellently.
Let’s increase the difficulty—now testing cursive handwriting. This image was sourced directly from Xiaohongshu (image supports scrolling):
PaddleOCR-VL’s result:
DeepSeek-OCR’s result:
Clearly, DeepSeek performs poorly on cursive text. We tested repeatedly—five times—and DeepSeek-OCR consistently recognized only four characters.
Also worth noting: I observed that hallucination rates in OCR models are very low. The first result matches exactly with subsequent runs.
Now let’s test complex diagrams:
Baidu PaddleOCR-VL’s result:
DeepSeek-OCR’s result:
Clearly, Baidu delivers better results. DeepSeek-OCR appears unable to interpret the meaning of bar charts.
Let’s lower the difficulty slightly and test a table example—a common use case in our product where we require OCR to convert content into structured tables.
PaddleOCR-VL’s output:
DeepSeek-OCR’s output:
Both DeepSeek-OCR and PaddleOCR-VL accurately recognize the text within the table—this isn’t particularly difficult.
But when reconstructing a structured table, DeepSeek-OCR makes alignment errors—noticeable misalignment in the rightmost columns. I believe DeepSeek-OCR still has significant room for improvement in chart-related tasks.
Let’s now test chemical equation recognition.
Baidu PaddleOCR-VL’s result:
DeepSeek-OCR’s result:
Text recognition is accurate in both cases. But DeepSeek fails to simultaneously recognize the equals sign and reaction conditions—PaddleOCR-VL handles this correctly.
Now let’s test multilingual mixed-language recognition (image supports scrolling):
Baidu PaddleOCR-VL’s output:
DeepSeek-OCR’s output:
Perfect—all correct. Both models excel in multilingual scenarios.
Finally, let’s test primary school pinyin recognition—a frequent use case among our users.
The challenge here lies in recognizing tone marks and children’s somewhat clumsy handwriting (image supports scrolling):
PaddleOCR-VL’s result:
DeepSeek-OCR’s result:
Now let’s test a case where pinyin appears on the right side (image supports scrolling):
Here’s DeepSeek-OCR’s recognition result (image supports scrolling):
This time, their accuracies are quite close. However, if we had to score them, we’d say DeepSeek-OCR performed slightly better in this scenario.
Testing complete. Here’s my conclusion:
- Undoubtedly, both PaddleOCR-VL and DeepSeek-OCR are outstanding OCR models—world-class, top-three caliber.
Their recognition accuracy surpasses all other known products on the market, and both offer strong multilingual support.
- DeepSeek-OCR’s weakness lies in handwritten text, particularly cursive handwriting, where accuracy drops significantly.
PaddleOCR-VL’s flaw is that it occasionally ignores pinyin in certain scenarios.
If the strengths of both could be combined, it would be ideal ![]()
- Based on my test cases, overall performance favors PaddleOCR-VL over DeepSeek-OCR.
Disclaimer: This content is created by Tencent Platform creators and does not represent the views or positions of Tencent News or Tencent.com.
Report


































