AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models

AI-generated keywords: AlphaPruning Large Language Models Layerwise Pruning Heavy-Tailed Self-Regularization Theory Sparsity

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors: Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang
  • Title: "AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models"
  • Objective: Enhance efficiency of large language models (LLMs) through pruning without performance loss
  • Methodology:
  • Introduces AlphaPruning as a theoretically grounded method for layerwise sparsity ratios based on shape metrics
  • Utilizes Heavy-Tailed Self-Regularization (HT-SR) Theory and empirical spectral densities (ESDs) to design improved pruning ratios
  • Findings:
  • Identifies variability in training levels and prunability across different layers in LLMs
  • Achieves significant sparsity levels with reasonable perplexity scores; e.g., 80% sparsity in LLaMA-7B model
  • Contribution:
  • Offers valuable insights and tools for enhancing efficiency and scalability of LLMs through advanced pruning techniques
  • Code openly available on GitHub for reproducibility and further research
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang

NeurIPS 2024, first two authors contributed equally

Abstract: Recent work on pruning large language models (LLMs) has shown that one can eliminate a large number of parameters without compromising performance, making pruning a promising strategy to reduce LLM model size. Existing LLM pruning strategies typically assign uniform pruning ratios across layers, limiting overall pruning ability; and recent work on layerwise pruning of LLMs is often based on heuristics that can easily lead to suboptimal performance. In this paper, we leverage Heavy-Tailed Self-Regularization (HT-SR) Theory, in particular the shape of empirical spectral densities (ESDs) of weight matrices, to design improved layerwise pruning ratios for LLMs. Our analysis reveals a wide variability in how well-trained, and thus relatedly how prunable, different layers of an LLM are. Based on this, we propose AlphaPruning, which uses shape metrics to allocate layerwise sparsity ratios in a more theoretically principled manner. AlphaPruning can be used in conjunction with multiple existing LLM pruning methods. Our empirical results show that AlphaPruning prunes LLaMA-7B to 80% sparsity while maintaining reasonable perplexity, marking a first in the literature on LLMs. We have open-sourced our code at https://github.com/haiquanlu/AlphaPruning.

Submitted to arXiv on 14 Oct. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2410.10912v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models," authors Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, and Yaoqing Yang delve into the realm of pruning large language models (LLMs) to enhance model efficiency without sacrificing performance. The study builds upon recent advancements in LLM pruning techniques and introduces AlphaPruning as a more theoretically grounded method for allocating layerwise sparsity ratios based on shape metrics. This novel approach offers a more nuanced and effective way to optimize pruning across various layers of an LLM. The authors highlight a key limitation in existing LLM pruning strategies and address this challenge by leveraging Heavy-Tailed Self-Regularization (HT-SR) Theory and analyzing the empirical spectral densities (ESDs) of weight matrices to design improved layerwise pruning ratios for LLMs. Their analysis uncovers substantial variability in the training levels and prunability of different layers within an LLM. The experimental results presented by the authors demonstrate the efficacy of AlphaPruning in achieving significant sparsity levels while maintaining reasonable perplexity scores. Notably, AlphaPruning achieves an impressive 80% sparsity level in the LLaMA-7B model, marking a notable milestone in LLM pruning research. Furthermore, the authors have made their code openly available on GitHub, facilitating reproducibility and further exploration by the research community. Overall, this study contributes valuable insights and practical tools for enhancing the efficiency and scalability of large language models through advanced pruning techniques based on rigorous theoretical foundations.
Created on 28 Feb. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.