Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

AI-generated keywords: CREST Deep Learning Efficiency Scalability Performance

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors propose a scalable framework called CREST to improve efficiency and sustainability of learning deep models
  • CREST is the first framework with theoretical guarantees for training non-convex models, specifically deep networks
  • CREST models non-convex loss as quadratic functions and extracts a coreset for each sub-region, allowing for more efficient training
  • CREST addresses faster convergence in stochastic gradient methods by extracting multiple mini-batch coresets from random subsets of data
  • CREST enhances scalability and efficiency by excluding examples that have already been learned from the coreset selection pipeline
  • Extensive experiments on various datasets confirm that CREST significantly speeds up training without sacrificing performance (1.7x to 2.5x speed improvements)
  • Deep models benefit most from learning subsets with increasing difficulty levels, highlighting the effectiveness of CREST in guiding model training towards challenging examples
  • Overall, CREST improves efficiency and sustainability through its theoretical guarantees, coreset extraction approach, and iterative mini-batch coreset generation
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yu Yang, Hao Kang, Baharan Mirzasoleiman

Abstract: To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly deep networks. To guarantee convergence to a stationary point of a non-convex function, CREST models the non-convex loss as a series of quadratic functions and extracts a coreset for each quadratic sub-region. In addition, to ensure faster convergence of stochastic gradient methods such as (mini-batch) SGD, CREST iteratively extracts multiple mini-batch coresets from larger random subsets of training data, to ensure nearly-unbiased gradients with small variances. Finally, to further improve scalability and efficiency, CREST identifies and excludes the examples that are learned from the coreset selection pipeline. Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, TinyImageNet, and SNLI, confirm that CREST speeds up training deep networks on very large datasets, by 1.7x to 2.5x with minimum loss in the performance. By analyzing the learning difficulty of the subsets selected by CREST, we show that deep models benefit the most by learning from subsets of increasing difficulty levels.

Submitted to arXiv on 02 Jun. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2306.01244v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "Towards Sustainable Learning: Coresets for Data-efficient Deep Learning," authors Yu Yang, Hao Kang, and Baharan Mirzasoleiman propose a scalable framework called CREST to improve the efficiency and sustainability of learning deep models. CREST is the first framework with rigorous theoretical guarantees that identifies valuable examples for training non-convex models, specifically deep networks. To ensure convergence to a stationary point of a non-convex function, CREST models the non-convex loss as a series of quadratic functions and extracts a coreset for each quadratic sub-region. This approach allows for more efficient training of deep networks by focusing on the most relevant examples. Additionally, CREST addresses the issue of faster convergence in stochastic gradient methods like mini-batch SGD. It achieves this by iteratively extracting multiple mini-batch coresets from larger random subsets of training data. This process ensures nearly-unbiased gradients with small variances, leading to improved convergence rates. Furthermore, CREST enhances scalability and efficiency by identifying and excluding examples that have already been learned from the coreset selection pipeline. This prevents redundant or unnecessary training on previously seen examples. The authors conducted extensive experiments on various deep networks trained on vision and NLP datasets such as CIFAR-10, CIFAR-100, TinyImageNet, and SNLI. The results confirm that CREST significantly speeds up training deep networks on large datasets without sacrificing performance. The proposed framework achieves speed improvements ranging from 1.7x to 2.5x while maintaining minimal loss in performance. By analyzing the learning difficulty of subsets selected by CREST, the authors demonstrate that deep models benefit most from learning subsets with increasing difficulty levels; this finding highlights the effectiveness of CREST in guiding model training towards challenging examples that contribute to better overall performance. Overall, this paper introduces an innovative framework called CREST which improves the efficiency and sustainability of learning deep models through its theoretical guarantees, coreset extraction approach and iterative mini-batch coreset generation; these features contribute to faster convergence rates and scalability while validating its effectiveness in accelerating training without compromising performance through experiments conducted on various datasets.
Created on 30 Nov. 2023

Assess the quality of the AI-generated content by voting

Score: 1

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.