Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

AI-generated keywords: Neural Networks Super-Convergence Generalization Learning Rate Cycle DAWNBench Challenge

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Super-convergence revolutionizes neural network training, leading to significantly faster training times compared to traditional methods.
  • Key component: Single learning rate cycle and large maximum learning rate used during training.
  • Role in regularization: Requires reduction in other forms of regularization for optimal balance.
  • Introduction of simplified Hessian Free optimization method to estimate optimal learning rate.
  • Effectiveness demonstrated through experiments on various datasets and architectures like resnet, wide-resnet, densenet, and inception.
  • Performance improvements observed compared to standard training methods when labeled data is limited.
  • Code and architectures available on github.com/lnsmith54/super-convergence for replication purposes.
  • Application in winning the DAWNBench challenge showcases practical relevance in real-world scenarios.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Leslie N. Smith, Nicholay Topin

This paper was significantly revised to show super-convergence as a general fast training methodology

Abstract: In this paper, we describe a phenomenon, which we named "super-convergence", where neural networks can be trained an order of magnitude faster than with standard training methods. The existence of super-convergence is relevant to understanding why deep networks generalize well. One of the key elements of super-convergence is training with one learning rate cycle and a large maximum learning rate. A primary insight that allows super-convergence training is that large learning rates regularize the training, hence requiring a reduction of all other forms of regularization in order to preserve an optimal regularization balance. We also derive a simplification of the Hessian Free optimization method to compute an estimate of the optimal learning rate. Experiments demonstrate super-convergence for Cifar-10/100, MNIST and Imagenet datasets, and resnet, wide-resnet, densenet, and inception architectures. In addition, we show that super-convergence provides a greater boost in performance relative to standard training when the amount of labeled training data is limited. The architectures and code to replicate the figures in this paper are available at github.com/lnsmith54/super-convergence. See http://www.fast.ai/2018/04/30/dawnbench-fastai/ for an application of super-convergence to win the DAWNBench challenge (see https://dawn.cs.stanford.edu/benchmark/).

Submitted to arXiv on 23 Aug. 2017

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1708.07120v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The discovery of super-convergence has revolutionized the training of neural networks, allowing for significantly faster training times compared to traditional methods. This phenomenon is crucial in understanding why deep networks are able to effectively generalize. One key component of super-convergence is the use of a single learning rate cycle and a large maximum learning rate during training. The authors emphasize that this approach also plays a role in regularization, requiring a reduction in other forms of regularization for optimal balance. They also introduce a simplified version of the Hessian Free optimization method to estimate the optimal learning rate. Through experiments on various datasets and architectures such as resnet, wide-resnet, densenet, and inception, the authors demonstrate the effectiveness of super-convergence. They show that it leads to significant performance improvements compared to standard training methods when labeled data is limited. Additionally, they provide access to their code and architectures used in their experiments for replication purposes on github.com/lnsmith54/super-convergence. Furthermore, the authors showcase an application of super-convergence in winning the DAWNBench challenge, highlighting its practical relevance in real-world scenarios. Overall, this paper presents super-convergence as a general fast training methodology with implications for enhancing the efficiency and effectiveness of neural network training processes.
Created on 29 May. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.