Language Modeling Is Compression

AI-generated keywords: Language Model Compression Scaling Laws Tokenization In-Context Learning

AI-generated Key Points

  • Large language models can be viewed as powerful general-purpose compressors due to their predictive capabilities.
  • Chinchilla 70B, a large language model trained on text, achieves impressive compression rates of 43.4% on ImageNet patches and 16.4% on LibriSpeech samples.
  • Dataset size imposes a limit on model size for optimal compression performance.
  • Scaling alone is not a solution for improving compression performance.
  • Any compressor, such as gzip, can be used as a conditional generative model by leveraging the equivalence between prediction and compression.
  • Tokenization does not necessarily improve compression but allows models to increase information content in context and enhance prediction performance.
  • The authors review concepts from information theory related to likelihood maximization and coding distributions.
  • Examples of compression-based generation are presented for different types of data including text, audio, and images.
  • Performance of gzip is compared with Chinchilla (a large language model) in generating coherent samples based on conditioning contexts.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Grégoire Delétang, Anian Ruoss, Paul-Ambroise Duquenne, Elliot Catt, Tim Genewein, Christopher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Matthew Aitchison, Laurent Orseau, Marcus Hutter, Joel Veness

License: CC BY 4.0

Abstract: It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model.

Submitted to arXiv on 19 Sep. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2309.10668v1

In this work, the authors investigate the lossless compression capabilities of large language models, specifically foundation models trained primarily on text. They argue that these models, due to their impressive predictive capabilities, can also be viewed as powerful general-purpose compressors. The authors make several contributions in their research. Firstly, they empirically explore the compression abilities of foundation models by reviewing how to compress with predictive models using arithmetic coding. They highlight the connection between current language modeling research and compression. The authors demonstrate that large language models are effective general-purpose compressors because of their in-context learning abilities. For example, Chinchilla 70B, a large language model trained on text, achieves impressive compression rates of 43.4% on ImageNet patches and 16.4% on LibriSpeech samples. These rates outperform domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Additionally, the authors provide a novel perspective on scaling laws in relation to compression performance. They show that dataset size imposes a limit on model size for optimal compression performance and emphasize that scaling alone is not a solution. Furthermore, the authors leverage the equivalence between prediction and compression to use any compressor (such as gzip) as a conditional generative model. The authors also discuss tokenization as a form of pre-compression and its impact on compression performance. They find that tokenization does not necessarily improve compression but allows models to increase information content in context and enhance prediction performance. In terms of background information, the authors review concepts from information theory related to likelihood maximization and coding distributions. To visually illustrate their findings, the authors present examples of compression-based generation for different types of data including text, audio, and images. They compare the performance of gzip with Chinchilla (a large language model) in generating coherent samples based on conditioning contexts.
Created on 21 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.