Baidu Unveils PaddleOCR-VL, Setting New OCR Benchmark
Baidu's PaddleOCR-VL Redefines Document Processing Standards
Baidu has officially released its PaddleOCR-VL, a state-of-the-art multimodal document parsing model that has set new performance benchmarks in optical character recognition (OCR) technology. The open-source model achieved a world-leading 92.6 score on the authoritative OmniBenchDoc V1.5 evaluation, demonstrating exceptional capabilities across four key areas: text recognition, table extraction, formula interpretation, and reading order prediction.
Technical Breakthroughs
The 0.9B parameter model combines efficiency with high performance through its innovative architecture:
- Integrates NaViT dynamic resolution visual encoder with ERNIE-4.5-0.3B language model
- Processes 1881 Tokens/second on single A100 GPU (253% faster than dots.ocr)
- Supports 109 languages, including complex scripts like Arabic and Chinese

Performance Metrics
PaddleOCR-VL delivers unprecedented accuracy:
- Text edit distance: 0.035
- Formula recognition (CDM): 91.43
- Table extraction (TEDS): 93.52
- Reading order error: 0.043
These metrics prove its reliability for challenging applications like historical archive digitization and handwritten manuscript processing.

Innovative Architecture
The model's two-stage approach revolutionizes document understanding:
- Layout detection and reading order prediction
- Structured output of text, tables, and formulas
This methodology enables human-like comprehension of complex documents including financial reports and academic papers while maintaining logical flow.

Practical Applications
The technology addresses critical needs across sectors:
- Government document management systems
- Enterprise knowledge retrieval platforms
- Academic research information extraction
- Historical archive preservation projects
The lightweight design makes it particularly suitable for deployment in resource-constrained environments.
Key Points:
- 🏆 World-leading performance on OmniBenchDoc V1.5 (92.6 score)
- ⚡ Ultra-efficient processing at 1881 Tokens/second
- 🌍 Supports 109 languages including complex scripts
- 🧠 Human-like understanding of document layouts
- 🔓 Open-source availability promotes widespread adoption




