Switch EMA: A Free Lunch for Better Flatness and Sharpness

AI-generated keywords: SEMA Exponential Moving Average DNNs deep learning optimization GitHub

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Switch EMA (SEMA) enhances DNN performance by leveraging Exponential Moving Average (EMA) in weight averaging regularization
  • SEMA involves switching EMA parameters back to the original model after each epoch, leading to significant improvements without extra computational cost
  • Integration of SEMA into training process helps DNNs achieve optimal generalization with a balance between flatness and sharpness
  • SEMA outperforms existing methods across various tasks like image classification, self-supervised learning, object detection, image generation, video prediction, attribute regression, and language modeling
  • Research by Siyuan Li et al. shows that SEMA is a "free lunch" for DNN training, improving final performances and convergence speeds across different optimizers and network architectures
  • Source code and models for further exploration available on GitHub (https://github.com/Westlake-AI/SEMA), making SEMA a game-changing technique for better flatness and sharpness in deep learning optimization
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di Wu, Cheng Tan, Tao Lin, Yang Liu, Baigui Sun, Stan Z. Li

Preprint V2. Source code and models at https://github.com/Westlake-AI/SEMA

Abstract: Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations. This work unveils the full potential of EMA with a single line of modification, i.e., switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA). From both theoretical and empirical aspects, we demonstrate that SEMA can help DNNs to reach generalization optima that better trade-off between flatness and sharpness. To verify the effectiveness of SEMA, we conduct comparison experiments with discriminative, generative, and regression tasks on vision and language datasets, including image classification, self-supervised learning, object detection and segmentation, image generation, video prediction, attribute regression, and language modeling. Comprehensive results with popular optimizers and networks show that SEMA is a free lunch for DNN training by improving performances and boosting convergence speeds.

Submitted to arXiv on 14 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.09240v2

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Switch EMA (SEMA) is a groundbreaking approach that enhances the performance of deep neural networks (DNNs) by leveraging the power of Exponential Moving Average (EMA) in weight averaging regularization. This innovative technique involves a simple modification - switching the EMA parameters back to the original model after each epoch - resulting in significant improvements without any additional computational cost. By seamlessly integrating SEMA into the training process, DNNs are able to achieve optimal generalization with a balanced trade-off between flatness and sharpness. The effectiveness of SEMA is validated through comprehensive experiments across various tasks including discriminative, generative, and regression tasks on both vision and language datasets. From image classification to self-supervised learning, object detection and segmentation, image generation, video prediction, attribute regression, and language modeling, SEMA consistently outperforms existing methods by enhancing convergence speeds and overall performance. The research conducted by Siyuan Li et al. demonstrates that SEMA serves as a "free lunch" for DNN training. The findings highlight how this novel approach not only improves final performances but also boosts convergence speeds across different optimizers and network architectures. With source code and models available for further exploration on GitHub (https://github.com/Westlake-AI/SEMA), SEMA emerges as a game-changing technique that unlocks the full potential of EMA for better flatness and sharpness in deep learning optimization.
Created on 25 Feb. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.