From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

AI-generated keywords: Large Language Models Regression Tasks Pre-trained Models Performance Limitations

AI-generated Key Points

  • Investigated capabilities of large language models (LLMs) in regression tasks without additional training
  • Pre-trained models like GPT-4 and Claude 3 can rival or outperform traditional supervised methods in regression tasks
  • Claude 3 showed superior performance on challenging datasets like Friedman #2 compared to other supervised methods
  • Performance of LLMs scales with number of in-context exemplars, achieving sub-linear regret based on online learning principles
  • Addressed limitations including reliance on proprietary models, potential data contamination issues, and released results from open-weight models for comparison
  • Empirical evidence supports effectiveness of LLMs in regression tasks but lacks theoretical explanations
  • Study provides valuable insights into regression capabilities of LLMs without needing additional training or updates
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Robert Vacareanu, Vlad-Andrei Negru, Vasile Suciu, Mihai Surdeanu

50 pages, 48 figures, preprint
License: CC BY 4.0

Abstract: We analyze how well pre-trained large language models (e.g., Llama2, GPT-4, Claude 3, etc) can do linear and non-linear regression when given in-context examples, without any additional training or gradient updates. Our findings reveal that several large language models (e.g., GPT-4, Claude 3) are able to perform regression tasks with a performance rivaling (or even outperforming) that of traditional supervised methods such as Random Forest, Bagging, or Gradient Boosting. For example, on the challenging Friedman #2 regression dataset, Claude 3 outperforms many supervised methods such as AdaBoost, SVM, Random Forest, KNN, or Gradient Boosting. We then investigate how well the performance of large language models scales with the number of in-context exemplars. We borrow from the notion of regret from online learning and empirically show that LLMs are capable of obtaining a sub-linear regret.

Submitted to arXiv on 11 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.07544v1

In this study, we investigated the capabilities of large language models (LLMs) in performing regression tasks without any additional training. Our analysis focused on various pre-trained models such as GPT-4 and Claude 3, highlighting their ability to rival or even outperform traditional supervised methods like Random Forest and Gradient Boosting in regression tasks. Notably, Claude 3 demonstrated superior performance on challenging datasets like Friedman #2 compared to other supervised methods. Furthermore, we explored how the performance of LLMs scales with the number of in-context exemplars, demonstrating their capacity to achieve sub-linear regret based on online learning principles. We also addressed several limitations in our study, including the reliance on proprietary models whose performance may vary over time and potential data contamination issues due to opaque training datasets for LLMs. To mitigate these challenges, we included results from leading open-weight models alongside proprietary ones and conducted experiments with multiple random seeds on new datasets. Additionally, we released intermediate results and considered a publicly available model like Falcon 40B for comparison. While our empirical evidence supports the effectiveness of LLMs in regression tasks, we acknowledge the lack of theoretical explanations to underpin these observations. Overall, our study provides valuable insights into the regression capabilities of large language models when presented with in-context examples, highlighting their potential as proficient regressors without the need for additional training or updates.
Created on 19 Dec. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.