GLM: General Language Model Pretraining with Autoregressive Blank Infilling

AI-generated keywords: GLM

AI-generated Key Points

Authors propose General Language Model (GLM) based on autoregressive blank infilling
GLM enhances blank filling pretraining with 2D positional encodings and arbitrary order predictions for spans
GLM outperforms BERT, T5, and GPT in NLU, conditional generation, and unconditional generation tasks
Even with 1.25 times the parameters of BERT Large, GLM achieves superior results from a single pretrained model
Comparative results show GLMRoBERTa outperforms other models like BERTSumAbs and UniLMv2Base in abstractive summarization tasks
Multi-task pretraining variations within the GLM framework - including GLMDoc and GLMSent - showcase competitive performance against established baselines

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang

arXiv: 2103.10360v2 - DOI (cs.CL)

to be published in ACL 2022. 16 pages, 4 figures

License: CC BY 4.0

Abstract: There have been various types of pretraining architectures including autoencoding models (e.g., BERT), autoregressive models (e.g., GPT), and encoder-decoder models (e.g., T5). However, none of the pretraining frameworks performs the best for all tasks of three main categories including natural language understanding (NLU), unconditional generation, and conditional generation. We propose a General Language Model (GLM) based on autoregressive blank infilling to address this challenge. GLM improves blank filling pretraining by adding 2D positional encodings and allowing an arbitrary order to predict spans, which results in performance gains over BERT and T5 on NLU tasks. Meanwhile, GLM can be pretrained for different types of tasks by varying the number and lengths of blanks. On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT given the same model sizes and data, and achieves the best performance from a single pretrained model with 1.25x parameters of BERT Large , demonstrating its generalizability to different downstream tasks.

Submitted to arXiv on 18 Mar. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2103.10360v2

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , In their paper titled "GLM: General Language Model Pretraining with Autoregressive Blank Infilling," authors Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, and Zhili propose a novel approach to pretraining language models that addresses the limitations of existing frameworks. They introduce the General Language Model (GLM) based on autoregressive blank infilling, which enhances blank filling pretraining by incorporating 2D positional encodings and enabling predictions in an arbitrary order for spans. This innovation leads to improved performance on NLU tasks compared to BERT and T5. GLM's flexibility allows it to be pretrained for different task types by adjusting the number and lengths of blanks. Across a diverse set of tasks spanning NLU, conditional generation, and unconditional generation, GLM surpasses BERT, T5, and GPT in performance while utilizing similar model sizes and data. Notably, even with 1.25 times the parameters of BERT Large, GLM achieves superior results from a single pretrained model. The authors also present comparative results on abstractive summarization using datasets such as CNN/DailyMail and XSum. They demonstrate that GLMRoBERTa outperforms other state-of-the-art models like BERTSumAbs (Liu and Lapata), UniLMv2Base (Bao et al. in terms of ROUGE scores on both datasets. The study further explores multi-task pretraining variations within the GLM framework - including GLMDoc and GLMSent - showcasing their competitive performance against established baselines. In conclusion, through innovative enhancements in blank infilling pretraining techniques coupled with multi-task learning capabilities, GLM emerges as a versatile language model that excels across diverse downstream tasks while demonstrating scalability with increased parameter sizes for optimal performance.

- Authors propose General Language Model (GLM) based on autoregressive blank infilling
- GLM enhances blank filling pretraining with 2D positional encodings and arbitrary order predictions for spans
- GLM outperforms BERT, T5, and GPT in NLU, conditional generation, and unconditional generation tasks
- Even with 1.25 times the parameters of BERT Large, GLM achieves superior results from a single pretrained model
- Comparative results show GLMRoBERTa outperforms other models like BERTSumAbs and UniLMv2Base in abstractive summarization tasks
- Multi-task pretraining variations within the GLM framework - including GLMDoc and GLMSent - showcase competitive performance against established baselines

Summary- Authors created a new language model called General Language Model (GLM) that fills in missing words. - GLM is better than other models like BERT, T5, and GPT for understanding language and creating sentences. - GLM has more parameters than BERT Large but still performs better with just one model. - A version of GLM called GLMRoBERTa is the best at summarizing information compared to other models. - Different versions of GLM, like GLMDoc and GLMSent, also work well for different tasks. Definitions- Authors: People who write books or research papers. - Language Model: A computer program that helps understand and generate human language. - Autoregressive: Predicting the next word based on previous words in a sentence. - Pretraining: Teaching a model using lots of data before using it for specific tasks. - Parameters: Settings or values that a model uses to make decisions.

Introduction: The field of natural language processing (NLP) has seen significant advancements in recent years, with the rise of pretraining-based models such as BERT and T5. These models have shown impressive results on various NLP tasks, but they also come with certain limitations. In their paper titled "GLM: General Language Model Pretraining with Autoregressive Blank Infilling," authors Zhengxiao Du et al. propose a novel approach to pretraining language models that addresses these limitations and outperforms existing frameworks. Background: Pretrained language models have become popular in NLP due to their ability to learn general linguistic knowledge from large amounts of unlabeled data. However, most existing approaches are based on either masked language modeling or autoregressive generation, which have their own drawbacks. Masked language modeling requires predicting randomly masked tokens, which can lead to information leakage and suboptimal performance on downstream tasks. On the other hand, autoregressive generation suffers from slow training and inference times due to its sequential nature. General Language Model (GLM): To overcome these limitations, the authors introduce GLM - a new framework for pretraining language models based on autoregressive blank infilling. This approach involves replacing random tokens in a sentence with special blank tokens and then predicting them in an arbitrary order using 2D positional encodings. This allows for more efficient training and better utilization of context compared to traditional masked language modeling. Performance Comparison: The authors evaluate GLM's performance against established baselines like BERT, T5, GPT-3, etc., across a diverse set of tasks spanning NLU (natural language understanding), conditional generation, and unconditional generation. They demonstrate that GLM outperforms these baselines while utilizing similar model sizes and data. Notably, even with 1.25 times the parameters of BERT Large, GLM achieves superior results from a single pretrained model. Abstractive Summarization: The study also explores GLM's performance on abstractive summarization using datasets such as CNN/DailyMail and XSum. They compare their model, GLMRoBERTa, with other state-of-the-art models like BERTSumAbs and UniLMv2Base. The results show that GLMRoBERTa outperforms these models in terms of ROUGE scores on both datasets. Multi-Task Pretraining: In addition to single-task pretraining, the authors also investigate multi-task pretraining variations within the GLM framework - including GLMDoc and GLMSent. These models are pretrained on multiple tasks simultaneously, leading to improved performance across all tasks compared to single-task pretraining. Conclusion: Through innovative enhancements in blank infilling pretraining techniques coupled with multi-task learning capabilities, GLM emerges as a versatile language model that excels across diverse downstream tasks while demonstrating scalability with increased parameter sizes for optimal performance. This paper presents a significant contribution to the field of NLP and opens up new possibilities for future research in this area. In conclusion, "GLM: General Language Model Pretraining with Autoregressive Blank Infilling" is an important research paper that introduces a novel approach to language model pretraining. By addressing the limitations of existing frameworks and showcasing superior performance across various NLP tasks, it has the potential to significantly impact the development of more advanced language models in the future.

Created on 20 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

76.0%

GLM-130B: An Open Bilingual Pre-trained Model

cs.CL

69.2%

PaLM: Scaling Language Modeling with Pathways

cs.CL

66.6%

A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems

cs.CL

66.4%

A Comprehensive Overview of Large Language Models

cs.CL

66.2%

XLNet: Generalized Autoregressive Pretraining for Language Understanding

cs.CL

65.6%

Large Language Models: A Survey

cs.CL

64.3%

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.