Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- The paper explores advancements in large language models (LLMs) like OpenAI-o1 and DeepSeek-R1, focusing on test-time scaling to enhance model performance through extended reasoning processes.
- Current LLMs face challenges with handling long texts and efficient training with reinforcement learning (RL).
- The authors propose a straightforward yet powerful approach called Multi-round Thinking to address these limitations.
- Extensive experiments involving models such as QwQ-32B and DeepSeek-R1 consistently showed performance improvements across benchmarks like AIME 2024, MATH-500, GPQA-diamond, and LiveCodeBench.
- Implementation of Multi-round Thinking led to increased accuracy in models like QwQ-32B (from 80.3% to 82.1%) and DeepSeek-R1 (from 79.7% to 82.0%) on datasets like AIME 2024.
- Results highlight the effectiveness of Multi-round Thinking in enhancing model performance across various tasks and datasets, showcasing its broad applicability.
- Multi-round Thinking offers a promising avenue for achieving stable enhancements in LLM performance by leveraging previous answers to guide subsequent reasoning processes.
Authors: Xiaoyu Tian, Sitong Zhao, Haotian Wang, Shuaiting Chen, Yunjie Ji, Yiping Peng, Han Zhao, Xiangang Li
Abstract: Recent advances in large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated the effectiveness of test-time scaling, where extended reasoning processes substantially enhance model performance. Despite this, current models are constrained by limitations in handling long texts and reinforcement learning (RL) training efficiency. To address these issues, we propose a simple yet effective test-time scaling approach Multi-round Thinking. This method iteratively refines model reasoning by leveraging previous answers as prompts for subsequent rounds. Extensive experiments across multiple models, including QwQ-32B and DeepSeek-R1, consistently show performance improvements on various benchmarks such as AIME 2024, MATH-500, GPQA-diamond, and LiveCodeBench. For instance, the accuracy of QwQ-32B improved from 80.3% (Round 1) to 82.1% (Round 2) on the AIME 2024 dataset, while DeepSeek-R1 showed a similar increase from 79.7% to 82.0%. These results confirm that Multi-round Thinking is a broadly applicable, straightforward approach to achieving stable enhancements in model performance, underscoring its potential for future developments in test-time scaling techniques. The key prompt: {Original question prompt} The assistant's previous answer is: <answer> {last round answer} </answer>, and please re-answer.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.