SIFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency

AI-generated keywords: SIFT Sparsity FLOPs Representational Capacity Accuracy

AI-generated Key Points

  • Weight sparsity has been explored to improve training efficiency of deep neural networks (DNNs) by reducing training FLOPs.
  • Sparse weights often lead to accuracy loss or require longer train schedules, making the resulting training efficiency less clear.
  • SIFT (Sparse Iso-FLOP Transformations) is a new approach that aims to increase accuracy while using the same FLOPS as the dense model and show training efficiency gains through higher accuracy.
  • SIFT is a family of drop-in replacements for dense layers that improve their representational capacity and FLOP efficiency.
  • Each transformation is parameterized by a single hyperparameter (sparsity level) and provides a larger search space to find optimal sparse masks.
  • SIFT can be used without changing any training hyperparameters and has shown significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL).
  • The method is explained for fully connected neural networks but can be extended straightforwardly to convolutional layers.
  • SIFT uses unstructured sparsity in weight matrices and ensures that the FLOPs of the transformation are the same as that of a dense feedforward function.
  • Detailed metrics such as AP, AP50, AP75, MIO can be found in Appendix C.2 for further evaluation.
  • Code is available at https://github.com/CerebrasResearch/SIFT.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Shreyas Saxena, Vithursan Thangarasa, Abhay Gupta, Sean Lie

License: CC BY 4.0

Abstract: Recent works have explored the use of weight sparsity to improve the training efficiency (test accuracy w.r.t training FLOPs) of deep neural networks (DNNs). These works aim to reduce training FLOPs but training with sparse weights often leads to accuracy loss or requires longer train schedules, making the resulting training efficiency less clear. In contrast, we focus on using sparsity to increase accuracy while using the same FLOPS as the dense model and show training efficiency gains through higher accuracy. In this work, we introduce SIFT, a family of Sparse Iso-FLOP Transformations which are used as drop-in replacements for dense layers to improve their representational capacity and FLOP efficiency. Each transformation is parameterized by a single parameter (sparsity level) and provides a larger search space to find optimal sparse masks. Without changing any training hyperparameters, replacing dense layers with SIFT leads to significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matching larger dense model variants with 2x or more FLOPs. To the best of our knowledge, this is the first work to demonstrate the use of sparsity for improving accuracy of dense models via a simple-to-use set of sparse transformations. Code is available at: https://github.com/CerebrasResearch/SIFT.

Submitted to arXiv on 21 Mar. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2303.11525v1

The use of weight sparsity has been explored in recent works to improve the training efficiency of deep neural networks (DNNs) by reducing training FLOPs. However, training with sparse weights often leads to accuracy loss or requires longer train schedules, making the resulting training efficiency less clear. In contrast, a new approach called SIFT (Sparse Iso-FLOP Transformations) aims to increase accuracy while using the same FLOPS as the dense model and show training efficiency gains through higher accuracy. SIFT is a family of drop-in replacements for dense layers that improve their representational capacity and FLOP efficiency. Each transformation is parameterized by a single hyperparameter (sparsity level) and provides a larger search space to find optimal sparse masks. SIFT can be used without changing any training hyperparameters and has shown significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL). These results match larger dense model variants with 2x or more FLOPs. This is the first work to demonstrate the use of sparsity for improving accuracy of dense models via a simple-to-use set of sparse transformations. The method is explained for fully connected neural networks but can be extended straightforwardly to convolutional layers. The feedforward function fθl computes output features as a linear transformation of input features, but most transformations are expressed as dense matrix multiplications due to widespread support on GPUs. SIFT uses unstructured sparsity in weight matrices and ensures that the FLOPs of the transformation are the same as that of a dense feedforward function. In addition, detailed metrics such as AP, AP50, AP75, MIO can be found in Appendix C.2 for further evaluation. Code is available at https://github.com/CerebrasResearch/SIFT.
Created on 22 Mar. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.