Stream of Search (SoS): Learning to Search in Language

AI-generated keywords: Stream of Search Framework Language Models Problem-Solving Advantage-Induced Policy Alignment (APA) Self-Taught Reasoner (STaR)

AI-generated Key Points

  • The Stream of Search (SoS) framework enhances language models' problem-solving abilities through searching in language
  • SoS unifies various search strategies into a common format, enabling diverse streams of search to be represented and trained effectively
  • Training with SoS leads to superior performance compared to models solely trained on optimal trajectories
  • SoS models can self-improve through optimization for correctness using APA and STaR
  • SoS teaches models to backtrack and explore alternative paths, leading to more adaptable and generalizable search capabilities
  • SoS models simulate state transitions themselves, allowing for increased flexibility and learnability compared to symbolic search
  • Future research directions include exploring hierarchical planning, incorporating reflection and self-evaluation for discovering novel search strategies, and enhancing the SoS framework with formalizable operations such as limits and subgoal setting
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman

License: CC BY 4.0

Abstract: Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (SoS). We propose a unified language for search that captures an array of different symbolic search strategies. We demonstrate our approach using the simple yet difficult game of Countdown, where the goal is to combine input numbers with arithmetic operations to reach a target number. We pretrain a transformer-based language model from scratch on a dataset of streams of search generated by heuristic solvers. We find that SoS pretraining increases search accuracy by 25% over models trained to predict only the optimal search trajectory. We further finetune this model with two policy improvement methods: Advantage-Induced Policy Alignment (APA) and Self-Taught Reasoner (STaR). The finetuned SoS models solve 36% of previously unsolved problems, including problems that cannot be solved by any of the heuristic solvers. Our results indicate that language models can learn to solve problems via search, self-improve to flexibly use different search strategies, and potentially discover new ones.

Submitted to arXiv on 01 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.03683v1

The Stream of Search (SoS) framework is introduced in this paper to enhance language models' problem-solving abilities through searching in language. By unifying various search strategies into a common format, SoS enables diverse streams of search to be represented and trained effectively. The authors highlight the importance of exposing models to the messy process of problem solving by demonstrating that training with SoS leads to superior performance compared to models solely trained on optimal trajectories. Additionally, the ability for SoS models to self-improve through optimization for correctness using APA and STaR is emphasized. The SoS framework addresses criticisms of language models for planning and problem solving by teaching them to backtrack and explore alternative paths. This allows them to consider multiple possible outcomes before committing to a course of action, ultimately leading to more adaptable and generalizable search capabilities. Unlike symbolic search that relies on an explicit environment model, SoS models simulate state transitions themselves, allowing for increased flexibility and learnability. While empirical results were limited to the game of Countdown, which represents complex planning problems, the authors are optimistic that SoS can extend to more challenging real-world tasks. Future research directions include exploring hierarchical planning, incorporating reflection and self-evaluation for discovering novel search strategies, and enhancing the SoS framework with formalizable operations such as limits and subgoal setting. Overall, this study demonstrates that language models can achieve symbolic reasoning characteristics such as structured search with backtracking and heuristic state evaluation within a sequence modeling paradigm. By exposing models to productive mistakes and embracing diverse search strategies while iteratively refining them, language models have the potential to tackle complex problems effectively and discover new problem-solving approaches.
Created on 22 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.