Chain of Thought Prompting Elicits Reasoning in Large Language Models

AI-generated keywords: Language Model Reasoning Tasks Prompting Scaling Curve NLEs

AI-generated Key Points

  • Language models struggle with certain reasoning tasks such as math word problems, symbolic manipulation, and commonsense reasoning.
  • Inducing a chain of thought via prompting can enable sufficiently large language models to perform better on reasoning tasks that otherwise have flat scaling curves.
  • Chain of thought prompting leads to dramatically increasing scaling curves for sufficiently large language models in six reasoning tasks where standard prompting has a flat scaling curve.
  • Standard prompting only provides a lower bound on the capabilities of large language models in principle.
  • This study leverages prompting by guiding the model to produce self-assisting outputs, unlike most techniques that focus on optimizing inputs/prompts for given tasks or improving interpretability using natural language explanations (NLEs).
  • Future work could explore how to induce reasoning at smaller model scales and other prompting methods that might expand the range of tasks that language models can solve.
  • The dependence on chain of thought prompting and sufficiently large models are both key components and major limitations.
  • Although manually augmenting exemplars with chains of thought is minimal in the few-shot setting, annotation costs could be prohibitive for fine-tuning.
  • Chain of thought prompting improves the scaling curve but does not necessarily solve all tasks compared with human accuracy.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, Denny Zhou

License: CC BY 4.0

Abstract: Although scaling up language model size has reliably improved performance on a range of NLP tasks, even the largest models currently struggle with certain reasoning tasks such as math word problems, symbolic manipulation, and commonsense reasoning. This paper explores the ability of language models to generate a coherent chain of thought -- a series of short sentences that mimic the reasoning process a person might have when responding to a question. Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks that otherwise have flat scaling curves.

Submitted to arXiv on 28 Jan. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2201.11903v1

This paper delves into the ability of language models to generate a coherent chain of thought, which mimics the reasoning process that a person might have when responding to a question. While scaling up language model size has improved performance on various NLP tasks, even the largest models currently struggle with certain reasoning tasks such as math word problems, symbolic manipulation, and commonsense reasoning. The experiments conducted in this study show that inducing a chain of thought via prompting can enable sufficiently large language models to perform better on reasoning tasks that otherwise have flat scaling curves. The emergence of chain of thought reasoning as a consequence of model scale has been a prevailing theme in these experiments. For six reasoning tasks where standard prompting has a flat scaling curve, chain of thought prompting leads to dramatically increasing scaling curves for sufficiently large language models. This observation underscores that standard prompting only provides a lower bound on the capabilities of large language models in principle and raises questions about how much more we can expect the reasoning ability to improve with further increases in model scale. This paper falls under general prompting approaches; however, unlike most techniques that focus on optimizing inputs/prompts for given tasks or improving interpretability using natural language explanations (NLEs), this study leverages prompting by guiding the model to produce self-assisting outputs. Future work could explore how to induce reasoning at smaller model scales and other prompting methods that might expand the range of tasks that language models can solve. The dependence on chain of thought prompting and sufficiently large models are both key components and major limitations. Although manually augmenting exemplars with chains of thought is minimal in the few-shot setting, annotation costs could be prohibitive for fine-tuning. Moreover, although chain of thought prompting improves the scaling curve, it does not necessarily solve all tasks compared with human accuracy. The paper provides valuable insights into the limitations and possibilities of large language models in performing reasoning tasks.
Created on 03 May. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.