LMDX: Language Model-based Document Information Extraction and Localization
AI-generated Key Points
- Large Language Models (LLMs) have advanced Natural Language Processing (NLP) and improved performance on various tasks.
- LLMs have limited application to semi-structured document information extraction.
- Challenges in adopting LLMs for this task include the absence of layout encoding and lack of a grounding mechanism.
- The authors propose a methodology called LMDX to address these challenges.
- LMDX enables the adaptation of arbitrary LLMs for document information extraction, supporting extraction of singular, repeated, and hierarchical entities with or without training data.
- LMDX provides grounding guarantees and localizes extracted entities within the document.
- LMDX is specifically applied to the PaLM 2-S LLM and evaluated on VRDU and CORD benchmarks, setting a new state-of-the-art in document information extraction.
- Document information extraction from semi-structured documents involves complexities such as complex layouts, spatial alignment, tabular arrangement of entities, printed or handwritten content, scanning artifacts, and precise entity localization.
- Current approaches involve two stages: text recognition/serialization using OCR services followed by parsing to extract relevant entity values from recognized text.
- Existing approaches have limitations in handling hierarchical entities or serialization errors.
- Some approaches leverage image modality in addition to text and layout information for alignment between modalities.
- Other approaches treat extraction as a sequence generation problem with an auto-regressive decoder on top of a text-layout-image encoder.
Authors: Vincent Perot, Kai Kang, Florian Luisier, Guolong Su, Xiaoyu Sun, Ramya Sree Boppana, Zilong Wang, Jiaqi Mu, Hao Zhang, Nan Hua
Abstract: Large Language Models (LLM) have revolutionized Natural Language Processing (NLP), improving state-of-the-art on many existing tasks and exhibiting emergent capabilities. However, LLMs have not yet been successfully applied on semi-structured document information extraction, which is at the core of many document processing workflows and consists of extracting key entities from a visually rich document (VRD) given a predefined target schema. The main obstacles to LLM adoption in that task have been the absence of layout encoding within LLMs, critical for a high quality extraction, and the lack of a grounding mechanism ensuring the answer is not hallucinated. In this paper, we introduce Language Model-based Document Information Extraction and Localization (LMDX), a methodology to adapt arbitrary LLMs for document information extraction. LMDX can do extraction of singular, repeated, and hierarchical entities, both with and without training data, while providing grounding guarantees and localizing the entities within the document. In particular, we apply LMDX to the PaLM 2-S LLM and evaluate it on VRDU and CORD benchmarks, setting a new state-of-the-art and showing how LMDX enables the creation of high quality, data-efficient parsers.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.