Understanding deep learning requires rethinking generalization

AI-generated keywords: Deep learning Generalization Neural networks Model complexity Regularization techniques

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Successful deep artificial neural networks exhibit a small difference between training and test performance despite their massive size.
  • Conventional explanations attributing this to model family properties or regularization techniques fall short.
  • State-of-the-art convolutional networks can fit random labeling of training data without explicit regularization, even with random noise instead of true images.
  • Simple depth two neural networks achieve perfect finite sample expressivity when the number of parameters exceeds the number of data points in practical applications.
  • The study challenges existing notions about model complexity and regularization techniques in deep learning, highlighting the need to rethink generalization.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals

Abstract: Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training. Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. We interpret our experimental findings by comparison with traditional models.

Submitted to arXiv on 10 Nov. 2016

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1611.03530v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "Understanding deep learning requires rethinking generalization," authors Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals delve into the phenomenon of successful deep artificial neural networks exhibiting a small difference between training and test performance despite their massive size. The conventional belief attributes this small generalization error to either properties of the model family or the regularization techniques employed during training. However, through extensive systematic experiments, the authors demonstrate that these traditional explanations fall short in elucidating why large neural networks generalize effectively in practice. Their experiments specifically reveal that state-of-the-art convolutional networks for image classification, trained using stochastic gradient methods, are capable of easily fitting a random labeling of the training data. Surprisingly, this phenomenon persists even in the absence of explicit regularization and when true images are replaced with completely unstructured random noise. To support their experimental findings, the authors construct a theoretical framework showing that simple depth two neural networks achieve perfect finite sample expressivity when the number of parameters exceeds the number of data points – a common scenario in practical applications. By comparing their experimental results with traditional models, Zhang et al. highlight the need to rethink generalization in deep learning. Their study challenges existing notions about model complexity and regularization techniques by showcasing how large neural networks can effectively generalize without relying on conventional explanations. This research sheds new light on the inner workings of deep learning systems and calls for a reevaluation of current understanding in order to further advance this rapidly evolving field.
Created on 10 Sep. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.