QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation

AI-generated keywords: Optical Character Recognition (OCR)

AI-generated Key Points

  • Challenges in Arabic script for OCR:
  • Cursive nature
  • Diacritical marks (tashkeel)
  • Varied typography
  • Development of Qari-OCR models:
  • Focus on optimizing OCR for Arabic text
  • Leading model: QARI v0.2 with impressive benchmarks
  • Qualitative analysis and visual illustrations:
  • Demonstrates proficiency in handling script complexities
  • Resilience to optical degradation and accurate transcription from varied inputs
  • Nuances of Arabic script challenges for OCR systems:
  • Diacritics, ligatures, variant letterforms, etc.
  • Strengths of Qari-OCR:
  • Structural document understanding
  • Handwritten text recognition capabilities
  • Contribution to the field:
  • Significant improvement in Arabic OCR accuracy and efficiency
  • Open-source models and datasets for further research opportunities
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ahmed Wasfy, Omer Nacar, Abdelakreem Elkhateb, Mahmoud Reda, Omar Elshehy, Adel Ammar, Wadii Boulila

License: CC BY-SA 4.0

Abstract: The inherent complexities of Arabic script; its cursive nature, diacritical marks (tashkeel), and varied typography, pose persistent challenges for Optical Character Recognition (OCR). We present Qari-OCR, a series of vision-language models derived from Qwen2-VL-2B-Instruct, progressively optimized for Arabic through iterative fine-tuning on specialized synthetic datasets. Our leading model, QARI v0.2, establishes a new open-source state-of-the-art with a Word Error Rate (WER) of 0.160, Character Error Rate (CER) of 0.061, and BLEU score of 0.737 on diacritically-rich texts. Qari-OCR demonstrates superior handling of tashkeel, diverse fonts, and document layouts, alongside impressive performance on low-resolution images. Further explorations (QARI v0.3) showcase strong potential for structural document understanding and handwritten text. This work delivers a marked improvement in Arabic OCR accuracy and efficiency, with all models and datasets released to foster further research.

Submitted to arXiv on 02 Jun. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2506.02295v1

, , , , In the realm of Optical Character Recognition (OCR), the complexities inherent in Arabic script have long posed challenges due to its cursive nature, diacritical marks (tashkeel), and varied typography. To address these challenges, a series of vision-language models known as Qari-OCR has been developed, with a focus on optimizing OCR specifically for Arabic text. Through iterative fine-tuning on specialized synthetic datasets, the leading model, QARI v0.2, has achieved impressive quantitative benchmarks with a Word Error Rate (WER) of 0.160, Character Error Rate (CER) of 0.061, and BLEU score of 0.737 on diacritically-rich texts. aside, qualitative analysis is essential to understand the practical capabilities of the model. Visual illustrations provided by Figure 2 showcase Qari-OCR's proficiency in handling various complexities inherent in Arabic script, supporting its strong quantitative performance. Moreover, the resilience of the model to optical degradation and its ability to accurately transcribe text from varied inputs were tested. Figure 3 demonstrates that Qari-OCR, particularly QARI v0.3 trained on more complex layouts, can robustly detect and transcribe Arabic text even from low-resolution images with small sizes and tightly cropped boundaries. The qualitative assessment further delves into the nuances of Arabic script that pose challenges for OCR systems such as diacritics (tashkeel), ligatures like Lam-Alif (*), variant letterforms, classical language structures, embedded punctuation and numerals, diverse orthographic forms of Hamza (ح), and features like Maddah. Additionally, an in-depth analysis reveals how Qari-OCR excels in structural document understanding and handwritten text recognition through models like QARI v0.3. This work signifies a significant improvement in Arabic OCR accuracy and efficiency while also providing all models and datasets as open-source resources to foster further research in this domain. Overall, the refined detailed summary emphasizes not only the but also highlights the qualitative strengths of Qari-OCR in handling complex Arabic script intricacies with precision and robustness across various document layouts and image resolutions.
Created on 30 Jun. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.