ERNIE-Layout: Layout Knowledge Enhanced Pre-training for Visually-rich Document Understanding

AI-generated keywords: Pre-training techniques Layout-centered knowledge ERNIE-Layout Multi-modal transformer architecture Document understanding

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Significant advancement in pre-training techniques for visually-rich document understanding
  • Introduction of ERNIE-Layout by a team of researchers
  • ERNIE-Layout enhances layout knowledge and generates better representations by combining text, layout, and image features
  • Key innovation of ERNIE-Layout: rearranging input sequences, introducing reading order prediction task, integrating spatial-aware disentangled attention, and replaced regions prediction task during pre-training
  • Experimental results show that ERNIE-Layout outperforms existing methods on various downstream tasks
  • Research paper detailing ERNIE-Layout accepted at EMNLP 2022 (Findings) authored by a team of experts in the field
  • Code and models associated with ERNIE-Layout are publicly available for further exploration and implementation
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Qiming Peng, Yinxu Pan, Wenjin Wang, Bin Luo, Zhenyu Zhang, Zhengjie Huang, Teng Hu, Weichong Yin, Yongfeng Chen, Yin Zhang, Shikun Feng, Yu Sun, Hao Tian, Hua Wu, Haifeng Wang

Accepted to EMNLP 2022 (Findings)

Abstract: Recent years have witnessed the rise and success of pre-training techniques in visually-rich document understanding. However, most existing methods lack the systematic mining and utilization of layout-centered knowledge, leading to sub-optimal performances. In this paper, we propose ERNIE-Layout, a novel document pre-training solution with layout knowledge enhancement in the whole workflow, to learn better representations that combine the features from text, layout, and image. Specifically, we first rearrange input sequences in the serialization stage, and then present a correlative pre-training task, reading order prediction, to learn the proper reading order of documents. To improve the layout awareness of the model, we integrate a spatial-aware disentangled attention into the multi-modal transformer and a replaced regions prediction task into the pre-training phase. Experimental results show that ERNIE-Layout achieves superior performance on various downstream tasks, setting new state-of-the-art on key information extraction, document image classification, and document question answering datasets. The code and models are publicly available at http://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-layout.

Submitted to arXiv on 12 Oct. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2210.06155v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In recent years, there has been a significant advancement in pre-training techniques for visually-rich document understanding. However, many existing methods lack the systematic mining and utilization of layout-centered knowledge, resulting in sub-optimal performance. To address this issue, a team of researchers including Qiming Peng, Yinxu Pan, Wenjin Wang, Bin Luo, Zhenyu Zhang, Zhengjie Huang, Teng Hu, Weichong Yin, Yongfeng Chen, Yin Zhang, Shikun Feng,Yu Sun,Hao Tian,Hua Wu,and Haifeng Wang have introduced ERNIE-Layout. This novel document pre-training solution enhances layout knowledge throughout the workflow to generate better representations by combining text features with layout and image features. The key innovation of ERNIE-Layout lies in its approach to rearranging input sequences during the serialization stage and introducing a reading order prediction task as part of the pre-training process. By learning the proper reading order of documents through this correlative task,the model gains a deeper understanding of document structure. Additionally,to enhance layout awareness further,the researchers integrate spatial-aware disentangled attention into the multi-modal transformer architecture.They also incorporate a replaced regions prediction task during pre-training to improve model performance. Experimental results demonstrate that ERNIE-Layout outperforms existing methods on various downstream tasks such as key information extraction, document image classification,and document question answering datasets.The research paper detailing ERNIE-Layout has been accepted at EMNLP 2022 (Findings)and is authored by a team of experts in the field.The code and models associated with ERNIE-Layout are publicly available for further exploration and implementation.
Created on 26 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.