GLM: General Language Model Pretraining with Autoregressive Blank Infilling

AI-generated keywords: GLM

AI-generated Key Points

  • Authors propose General Language Model (GLM) based on autoregressive blank infilling
  • GLM enhances blank filling pretraining with 2D positional encodings and arbitrary order predictions for spans
  • GLM outperforms BERT, T5, and GPT in NLU, conditional generation, and unconditional generation tasks
  • Even with 1.25 times the parameters of BERT Large, GLM achieves superior results from a single pretrained model
  • Comparative results show GLMRoBERTa outperforms other models like BERTSumAbs and UniLMv2Base in abstractive summarization tasks
  • Multi-task pretraining variations within the GLM framework - including GLMDoc and GLMSent - showcase competitive performance against established baselines
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang

to be published in ACL 2022. 16 pages, 4 figures
License: CC BY 4.0

Abstract: There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks. On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT given the same model sizes and data, and achieves the best performance from a single pretrained model with 1.25x parameters of BERT Large , demonstrating its generalizability to different downstream tasks.

Submitted to arXiv on 18 Mar. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2103.10360v2

, , , , In their paper titled "GLM: General Language Model Pretraining with Autoregressive Blank Infilling," authors Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, and Zhili propose a novel approach to pretraining language models that addresses the limitations of existing frameworks. They introduce the General Language Model (GLM) based on autoregressive blank infilling, which enhances blank filling pretraining by incorporating 2D positional encodings and enabling predictions in an arbitrary order for spans. This innovation leads to improved performance on NLU tasks compared to BERT and T5. GLM's flexibility allows it to be pretrained for different task types by adjusting the number and lengths of blanks. Across a diverse set of tasks spanning NLU, conditional generation, and unconditional generation, GLM surpasses BERT, T5, and GPT in performance while utilizing similar model sizes and data. Notably, even with 1.25 times the parameters of BERT Large, GLM achieves superior results from a single pretrained model. The authors also present comparative results on abstractive summarization using datasets such as CNN/DailyMail and XSum. They demonstrate that GLMRoBERTa outperforms other state-of-the-art models like BERTSumAbs (Liu and Lapata), UniLMv2Base (Bao et al. in terms of ROUGE scores on both datasets. The study further explores multi-task pretraining variations within the GLM framework - including GLMDoc and GLMSent - showcasing their competitive performance against established baselines. In conclusion, through innovative enhancements in blank infilling pretraining techniques coupled with multi-task learning capabilities, GLM emerges as a versatile language model that excels across diverse downstream tasks while demonstrating scalability with increased parameter sizes for optimal performance.
Created on 20 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.