When Does Re-initialization Work?
AI-generated Key Points
- Re-initializing a neural network during training has been observed to improve generalization in recent works.
- This technique is not widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols.
- The authors conducted an extensive empirical comparison of standard training with a selection of re-initialization methods by training over 15,000 models on a variety of image classification benchmarks.
- Re-initialization methods are consistently beneficial for generalization in the absence of any other regularization.
- When deployed alongside other carefully tuned regularization techniques such as data augmentation, weight decay, and learning rate schedules that resemble state-of-the-art training protocols, re-initialization methods offer little to no added benefit for generalization.
- Optimal generalization performance becomes less sensitive to the choice of hyperparameters under these conditions.
- Under label noise where other regularization techniques are not able to offer much help on learning tasks, re-initialization significantly improves upon standard training.
- Fixed-budget BANs do not improve performance compared to standard training in most cases but can serve as an important baseline for more sophisticated re-initialization methods.
- A deeper understanding of why re-initializations work or do not work well is missing and future work could explore online learning implications and extend the scope of study beyond specific datasets/architectures.
Authors: Sheheryar Zaidi, Tudor Berariu, Hyunjik Kim, Jörg Bornschein, Claudia Clopath, Yee Whye Teh, Razvan Pascanu
Abstract: Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a selection of re-initialization methods to answer this question, training over 15,000 models on a variety of image classification benchmarks. We first establish that such methods are consistently beneficial for generalization in the absence of any other regularization. However, when deployed alongside other carefully tuned regularization techniques, re-initialization methods offer little to no added benefit for generalization, although optimal generalization performance becomes less sensitive to the choice of learning rate and weight decay hyperparameters. To investigate the impact of re-initialization methods on noisy data, we also consider learning under label noise. Surprisingly, in this case, re-initialization significantly improves upon standard training, even in the presence of other carefully tuned regularization techniques.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Welcome to our AI assistant! Here are some important things to keep in mind:
- The assistant will only answer questions related to this specific paper.
- Please note that this is not a bot for casual chatting.
- If you want the answer in a language other than the language you chose for navigating the website, simply add "TRANSLATE IN LANGUAGE L" at the end of your query (replace "LANGUAGE L" with the language of your choice).
- For example, you could ask "Can you extract the most important aspect of the paper? TRANSLATE IN SPANISH".
- If you want to keep the history of your questions/answers you should create an account.
Assess the quality of the AI-generated content by voting
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through atree representation
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.