Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning

AI-generated keywords: Parameter-efficient fine-tuning Low-rank Attention Side-Tuning Pretrained models Downstream tasks Visual adaptation

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Paper introduces Low-rank Attention Side-Tuning (LAST) for fine-tuning large pretrained models
  • LAST disentangles trainable module from pretrained model by freezing parameters and outputs
  • Trains a side-network with low-rank self-attention modules to focus on task-specific knowledge
  • Highly parallel across multiple optimization objectives, efficient in downstream task adaptation and hyperparameter optimization
  • Outperforms previous state-of-the-art methods on visual adaptation tasks like VTAB-1K
  • Achieves higher accuracy, consumes 30% less GPU memory, and requires 60% less training time compared to existing PEFT methods
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ningyuan Tang, Minghao Fu, Ke Zhu, Jianxin Wu

Abstract: In finetuning a large pretrained model to downstream tasks, parameter-efficient fine-tuning (PEFT) methods can effectively finetune pretrained models with few trainable parameters, but suffer from high GPU memory consumption and slow training speed. Because learnable parameters from these methods are entangled with the pretrained model, gradients related to the frozen pretrained model's parameters have to be computed and stored during finetuning. We propose Low-rank Attention Side-Tuning (LAST), which disentangles the trainable module from the pretrained model by freezing not only parameters but also outputs of the pretrained network. LAST trains a side-network composed of only low-rank self-attention modules. By viewing the pretrained model as a frozen feature extractor, the side-network takes intermediate output from the pretrained model and focus on learning task-specific knowledge. We also show that LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation, for example, in finding optimal hyperparameters. LAST outperforms previous state-of-the-art methods on VTAB-1K and other visual adaptation tasks with roughly only 30\% of GPU memory footprint and 60\% of training time compared to existing PEFT methods, but achieves significantly higher accuracy.

Submitted to arXiv on 06 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.04009v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The paper "Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning" by Ningyuan Tang, Minghao Fu, Ke Zhu, and Jianxin Wu introduces a novel approach to fine-tuning large pretrained models for downstream tasks. The authors propose Low-rank Attention Side-Tuning (LAST), a method that disentangles the trainable module from the pretrained model by freezing not only parameters but also outputs of the pretrained network. LAST trains a side-network consisting of low-rank self-attention modules and leverages the frozen feature extractor capabilities of the pretrained model to focus on learning task-specific knowledge. This allows LAST to be highly parallel across multiple optimization objectives, making it efficient in downstream task adaptation and hyperparameter optimization. The authors demonstrate that LAST outperforms previous state-of-the-art methods on visual adaptation tasks such as VTAB-1K, achieving significantly higher accuracy while consuming roughly 30% less GPU memory and requiring 60% less training time compared to existing PEFT methods. Overall, LAST represents a promising advancement in parameter-efficient fine-tuning techniques with improved performance and efficiency in adapting pretrained models to specific downstream tasks.
Created on 13 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.