Scaling laws for post-training quantized large language models

AI-generated keywords: Post-training weight quantization Large language models Scaling laws Predictability Efficiency

AI-generated Key Points

  • Study title: "Scaling laws for post-training quantized large language models"
  • Investigates predictability of post-training weight quantization performance for large language models (LLMs)
  • Well-trained LLMs have predictable generalization abilities based on model size
  • Quality of compressed LLMs is often unpredictable and requires individual validation
  • Conducted systematic empirical study on multiple LLM families using popular weight quantization techniques and low-precision tensor data types
  • Identified key scaling factors related to the local loss landscape to predict performance of quantized LLMs
  • Developed a statistical model based on these factors
  • Explored complexities and trade-offs in post-training weight quantization, highlighting challenges in finding optimal quantization formats and model parameter counts within fixed constraints
  • Findings shed light on how properties such as pre-trained negative log-likelihood (NLL) loss scale with total parameter counts in transformer layers' weight tensors
  • Provided insights into local radial loss landscape mapping and illustrated trade-off between larger models quantized to lower bit formats versus smaller models quantized to higher bit formats
  • Accepted at the 4th NeurIPS Workshop on Efficient Natural Language and Speech Processing (ENLSP) in 2024
  • Contributes valuable insights into improving predictability and efficiency of post-training weight quantization for large language models
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zifei Xu, Alexander Lan, Wanzin Yazar, Tristan Webb, Sayeh Sharify, Xin Wang

License: CC BY-NC-SA 4.0

Abstract: Generalization abilities of well-trained large language models (LLMs) are known to scale predictably as a function of model size. In contrast to the existence of practical scaling laws governing pre-training, the quality of LLMs after post-training compression remains highly unpredictable, often requiring case-by-case validation in practice. In this work, we attempted to close this gap for post-training weight quantization of LLMs by conducting a systematic empirical study on multiple LLM families quantized to numerous low-precision tensor data types using popular weight quantization techniques. We identified key scaling factors pertaining to characteristics of the local loss landscape, based on which the performance of quantized LLMs can be reasonably well predicted by a statistical model.

Submitted to arXiv on 15 Oct. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2410.12119v1

In their study titled "Scaling laws for post-training quantized large language models," Xu et al. investigate the predictability of post-training weight quantization performance for large language models (LLMs). While well-trained LLMs have predictable generalization abilities based on model size, the quality of compressed LLMs is often unpredictable and requires individual validation. To address this issue, the authors conduct a systematic empirical study on multiple LLM families using popular weight quantization techniques and various low-precision tensor data types. They identify key scaling factors related to the local loss landscape that can help predict the performance of quantized LLMs and develop a statistical model based on these factors. The study also delves into the complexities and trade-offs involved in post-training weight quantization, highlighting the challenges in finding optimal quantization formats and model parameter counts within fixed constraints. The authors' findings shed light on how properties such as pre-trained negative log-likelihood (NLL) loss scale with total parameter counts in transformer layers' weight tensors. They also provide insights into local radial loss landscape mapping and illustrate the trade-off between larger models quantized to lower bit formats versus smaller models quantized to higher bit formats. This work was accepted at the 4th NeurIPS Workshop on Efficient Natural Language and Speech Processing (ENLSP) in 2024 and contributes valuable insights into improving the predictability and efficiency of post-training weight quantization for large language models.
Created on 28 Jun. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.