Feature Purification: How Adversarial Training Performs Robust Deep Learning

AI-generated keywords: Feature Purification

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Zeyuan Allen-Zhu and Yuanzhi Li explore the effectiveness of Adversarial Training in defending deep learning models against adversarial perturbations
  • Introduction of Feature Purification concept to address accumulation of specific small dense mixtures in hidden weights during neural network training
  • Demonstrated evidence that training a neural network over original data is susceptible to non-robustness against small adversarial perturbations within a certain radius
  • Through adversarial training, models can be proven robust against ANY perturbations within the same radius, even with empirical perturbation algorithms like FGM
  • Complexity lower bound established indicating that models with low complexity are unable to defend against perturbations within a certain radius regardless of training algorithms used
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zeyuan Allen-Zhu, Yuanzhi Li

v2 and V3 polish writing and experiments, V4 adds experiments showing that adversarial training can be done through low-rank updates

Abstract: Despite the empirical success of using Adversarial Training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them. In this paper, we present a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network; and more importantly, one of the goals of adversarial training is to remove such mixtures to purify hidden weights. We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly initialized gradient descent indeed satisfies this principle. Technically, we give, to the best of our knowledge, the first result proving that the following two can hold simultaneously for training a neural network with ReLU activation. (1) Training over the original data is indeed non-robust to small adversarial perturbations of some radius. (2) Adversarial training, even with an empirical perturbation algorithm such as FGM, can in fact be provably robust against ANY perturbations of the same radius. Finally, we also prove a complexity lower bound, showing that low complexity models such as linear classifiers, low-degree polynomials, or even the neural tangent kernel for this network, CANNOT defend against perturbations of this same radius, no matter what algorithms are used to train them.

Submitted to arXiv on 20 May. 2020

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2005.10190v4

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In their paper "Feature Purification: How Adversarial Training Performs Robust Deep Learning," Zeyuan Allen-Zhu and Yuanzhi Li explore the effectiveness of Adversarial Training in defending deep learning models against adversarial perturbations. They introduce the concept of Feature Purification, which sheds light on one of the root causes of adversarial examples - the accumulation of specific small dense mixtures in hidden weights during neural network training. Through experiments on the CIFAR-10 dataset and theoretical analysis, they demonstrate that training a two-layer neural network with ReLU activation using randomly initialized gradient descent aligns with this principle. This work provides evidence that training a neural network over original data is susceptible to non-robustness against small adversarial perturbations within a certain radius, but through adversarial training, even employing empirical perturbation algorithms like FGM, the model can be proven robust against ANY perturbations within the same radius. The authors also establish a complexity lower bound indicating that models with low complexity are unable to defend against perturbations within this radius regardless of the training algorithms employed. This study not only elucidates the mechanisms behind adversarial perturbations and their removal through feature purification but also provides valuable insights into enhancing robustness in deep learning models through adversarial training strategies.
Created on 17 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.