shoaibmohd
's Collections
MinerU2.5: A Decoupled Vision-Language Model for Efficient
High-Resolution Document Parsing
Paper
•
2509.22186
•
Published
•
139
CommonForms: A Large, Diverse Dataset for Form Field Detection
Paper
•
2509.16506
•
Published
•
19
Automated Structured Radiology Report Generation with Rich Clinical
Context
Paper
•
2510.00428
•
Published
•
7
Extract-0: A Specialized Language Model for Document Information
Extraction
Paper
•
2509.22906
•
Published
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model
Paper
•
2510.14528
•
Published
•
111
RL makes MLLMs see better than SFT
Paper
•
2510.16333
•
Published
•
48
NVIDIA Nemotron Parse 1.1
Paper
•
2511.20478
•
Published
•
21
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
Paper
•
2511.16334
•
Published
•
92
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI
Paper
•
2502.17092
•
Published
•
3
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion
Paper
•
2503.11576
•
Published
•
125