DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

AI-generated keywords: Language models Mathematical reasoning DeepSeekMath 7B GRPO Artificial intelligence

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • DeepSeekMath 7B developed by a team of researchers led by Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu and Daya Guo
  • Incorporates 120 billion math-related tokens from Common Crawl alongside natural language and code data
  • Achieved remarkable performance on the MATH benchmark with a score of 51.7%
  • Demonstrated prowess without relying on external toolkits or voting techniques
  • Approaches the performance levels of established models such as Gemini-Ultra and GPT-4
  • Achieved an even higher score of 60.9% on MATH when tested for self-consistency over 64 samples
  • Success attributed to leveraging publicly available web data through a meticulously engineered data selection pipeline and introducing Group Relative Policy Optimization (GRPO)
  • GRPO played a crucial role in enhancing mathematical reasoning abilities while optimizing memory usage concurrently
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, Daya Guo

Abstract: Mathematical reasoning poses a significant challenge for language models due to its complex and structured nature. In this paper, we introduce DeepSeekMath 7B, which continues pre-training DeepSeek-Coder-Base-v1.5 7B with 120B math-related tokens sourced from Common Crawl, together with natural language and code data. DeepSeekMath 7B has achieved an impressive score of 51.7% on the competition-level MATH benchmark without relying on external toolkits and voting techniques, approaching the performance level of Gemini-Ultra and GPT-4. Self-consistency over 64 samples from DeepSeekMath 7B achieves 60.9% on MATH. The mathematical reasoning capability of DeepSeekMath is attributed to two key factors: First, we harness the significant potential of publicly available web data through a meticulously engineered data selection pipeline. Second, we introduce Group Relative Policy Optimization (GRPO), a variant of Proximal Policy Optimization (PPO), that enhances mathematical reasoning abilities while concurrently optimizing the memory usage of PPO.

Submitted to arXiv on 05 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.03300v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the realm of language models, mathematical reasoning presents a formidable challenge due to its intricate and organized nature. To address this challenge, a team of researchers led by Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu and Daya Guo have developed DeepSeekMath 7B. This model builds upon the pre-training of DeepSeek-Coder-Base-v1.5 7B by incorporating 120 billion math-related tokens sourced from Common Crawl alongside natural language and code data. The groundbreaking achievement of DeepSeekMath 7B is evident in its remarkable performance on the competition-level MATH benchmark. With an impressive score of 51.7%, this model has demonstrated its prowess without relying on external toolkits or voting techniques. In fact, it approaches the performance levels of established models such as Gemini-Ultra and GPT-4. Furthermore, when tested for self-consistency over 64 samples from DeepSeekMath 7B , an even higher score of 60.9% on MATH was achieved. The success of DeepSeekMath can be attributed to two key factors that set it apart from other models in the field. Firstly,the researchers leveraged the vast potential of publicly available web data through a meticulously engineered data selection pipeline.This strategic approach allowed them to enrich the model's understanding and application of mathematical concepts significantly. Secondly,the introduction of Group Relative Policy Optimization (GRPO),a variant of Proximal Policy Optimization (PPO), played a crucial role in enhancing DeepSeekMath's mathematical reasoning abilities while optimizing memory usage concurrently.This innovative technique not only improved the model's performance but also streamlined its computational efficiency. In summary, DeepSeekMath represents a significant advancement in open language models' ability to tackle complex mathematical reasoning tasks effectively. By pushing the limits of what is achievable in this domain through strategic data selection and optimization techniques like GRPO, this research paves the way for further advancements in artificial intelligence and natural language processing technologies.
Created on 28 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.