Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context

AI-generated keywords: Transformers Supervised Learning Pre-training Generalization Noisy Data

AI-generated Key Points

  • Researchers investigate transformers as supervised learning algorithms
  • Linear transformers show prediction algorithm similar to ordinary least squares for linear regression tasks
  • Study focuses on linear transformers trained on random linear classification tasks and gradient descent regularization
  • Determining necessary number of pre-training tasks and in-context examples for effective generalization at test-time
  • Observing phenomenon of transformer generalizing optimally despite noisy labels in in-context examples
  • Study sheds light on behavior, capabilities, generalization abilities, and resilience of trained transformers in classification tasks
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Spencer Frei, Gal Vardi

34 pages
License: CC BY 4.0

Abstract: Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that in some settings, these trained transformers can exhibit "benign overfitting in-context": when in-context examples are corrupted by label flipping noise, the transformer memorizes all of its in-context examples (including those with noisy labels) yet still generalizes near-optimally for clean test examples.

Submitted to arXiv on 02 Oct. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2410.01774v1

In this study, the researchers investigate the potential of transformers as supervised learning algorithms. By inputting a set of labeled training examples and an unlabeled test example into the transformer, predictions can be generated for the test example. Previous research has shown that linear transformers exhibit a prediction algorithm similar to ordinary least squares when pre-trained on random instances for linear regression tasks. However, this study focuses on linear transformers trained on random linear classification tasks and delves into the implicit regularization of gradient descent. The aim is to determine the necessary number of pre-training tasks and in-context examples for the transformer to effectively generalize at test-time. This phenomenon is observed when in-context examples are affected by label flipping noise; despite memorizing all examples (including those with noisy labels), the transformer still generalizes optimally for clean test examples. The study sheds light on the behavior and capabilities of trained transformers in classification tasks, providing insights into their generalization abilities and resilience to noisy data.
Created on 17 Feb. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.