LoRA: Low-Rank Adaptation of Large Language Models

AI-generated keywords: LoRA Low-Rank Adaptation Large Language Models Fine-tuning Transformer Architecture

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Introduction of Low-Rank Adaptation (LoRA) as a novel approach for adapting large language models
  • LoRA involves freezing pre-trained model weights and introducing trainable rank decomposition matrices into each layer of the Transformer architecture
  • Significant reduction in trainable parameters for downstream tasks compared to traditional fine-tuning methods, with up to 10,000 times fewer parameters and three times less GPU memory requirements
  • Comparable or superior model quality performance on popular language models like RoBERTa, DeBERTa, GPT-2, and GPT-3
  • Higher training throughput and no additional inference latency compared to adapter-based approaches
  • Empirical investigation into rank-deficiency in language model adaptation to support the effectiveness of LoRA
  • Availability of a package with implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 on GitHub
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

Draft V2 includes better baselines, experiments on GLUE, and more on adapter latency

Abstract: An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive. We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared to GPT-3 175B fine-tuned with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and, unlike adapters, no additional inference latency. We also provide an empirical investigation into rank-deficiency in language model adaptation, which sheds light on the efficacy of LoRA. We release a package that facilitates the integration of LoRA with PyTorch models and provide our implementations and model checkpoints for RoBERTa, DeBERTa, and GPT-2 at https://github.com/microsoft/LoRA.

Submitted to arXiv on 17 Jun. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2106.09685v2

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper "LoRA: Low-Rank Adaptation of Large Language Models" by authors Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen introduces a novel approach to adapting large language models for specific tasks or domains. The traditional method of full fine-tuning on pre-trained models like GPT-3 175B becomes impractical due to the high computational cost associated with deploying multiple instances of fine-tuned models. In response to this challenge, the authors propose Low-Rank Adaptation (LoRA), which involves freezing the pre-trained model weights and introducing trainable rank decomposition matrices into each layer of the Transformer architecture. By implementing LoRA, the number of trainable parameters for downstream tasks can be significantly reduced compared to traditional fine-tuning methods. For instance, when compared to fine-tuning GPT-3 175B with Adam, LoRA can reduce the number of trainable parameters by 10,000 times and decrease GPU memory requirements by three times. Despite having fewer trainable parameters, LoRA demonstrates comparable or superior model quality performance on popular language models such as RoBERTa, DeBERTa, GPT-2 and GPT-3. Furthermore, LoRA offers higher training throughput and does not introduce additional inference latency like adapter-based approaches. The paper also includes an empirical investigation into rank-deficiency in language model adaptation to provide insights into the effectiveness of LoRA. To facilitate the integration of LoRA with PyTorch models,the authors have released a package along with implementations and model checkpoints for RoBERTa ,DeBERTa,and GPT-2 on GitHub. In summary,"LoRA: Low-Rank Adaptation of Large Language Models" presents a promising solution for efficiently adapting large language models to specific tasks or domains while maintaining high performance levels and reducing computational costs.
Created on 21 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.