Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

AI-generated keywords: VibeThinker-1.5B groundbreaking model reasoning capabilities diversity-driven optimization scaling paradigms

AI-generated Key Points

  • VibeThinker-1.5B is a groundbreaking 1.5B-parameter model challenging the belief that small models have limited reasoning capabilities
  • Developed through the Spectrum-to-Signal Principle (SSP) showcasing superior reasoning performance compared to larger models like DeepSeek R1 and Kimi k2
  • Utilizes Two-Stage Diversity-Exploring Distillation (SFT) followed by MaxEnt-Guided Policy Optimization (RL) for exceptional reasoning abilities on math benchmarks such as AIME24, AIME25, and HMMT25
  • Surpasses DeepSeek R1 in tasks while reducing training costs to $7,800
  • Outperforms Magistral Medium and its own base model on LiveCodeBench V6, highlighting prowess in specialized domains and complex reasoning tasks
  • Remarkable performance in diverse scenarios when compared against state-of-the-art models with Long-CoT capabilities and non-reasoning models
  • Evaluation settings using vLLM for accurate assessments of performance metrics
  • "Tiny Model, Big Logic" arXiv query emphasizes large-model reasoning ability elicited through diversity-driven optimization in VibeThinker-1.5B
  • Challenges existing Scaling Law assumptions by showing small models can achieve remarkable reasoning capabilities comparable to larger counterparts while reducing training and inference costs
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Sen Xu, Yi Zhou, Wei Wang, Jixin Min, Zhibin Yin, Yingwei Dai, Shixi Liu, Lianyu Pang, Yirong Chen, Junlin Zhang

License: CC ZERO 1.0

Abstract: Challenging the prevailing consensus that small models inherently lack robust reasoning, this report introduces VibeThinker-1.5B, a 1.5B-parameter dense model developed via our Spectrum-to-Signal Principle (SSP). This challenges the prevailing approach of scaling model parameters to enhance capabilities, as seen in models like DeepSeek R1 (671B) and Kimi k2 (>1T). The SSP framework first employs a Two-Stage Diversity-Exploring Distillation (SFT) to generate a broad spectrum of solutions, followed by MaxEnt-Guided Policy Optimization (RL) to amplify the correct signal. With a total training cost of only $7,800, VibeThinker-1.5B demonstrates superior reasoning capabilities compared to closed-source models like Magistral Medium and Claude Opus 4, and performs on par with open-source models like GPT OSS-20B Medium. Remarkably, it surpasses the 400x larger DeepSeek R1 on three math benchmarks: AIME24 (80.3 vs. 79.8), AIME25 (74.4 vs. 70.0), and HMMT25 (50.4 vs. 41.7). This is a substantial improvement over its base model (6.7, 4.3, and 0.6, respectively). On LiveCodeBench V6, it scores 51.1, outperforming Magistral Medium's 50.3 and its base model's 0.0. These findings demonstrate that small models can achieve reasoning capabilities comparable to large models, drastically reducing training and inference costs and thereby democratizing advanced AI research.

Submitted to arXiv on 09 Nov. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2511.06221v1

Introducing VibeThinker-1.5B: A Revolutionary Model with Exceptional Reasoning Capabilities VibeThinker-1.5B is a groundbreaking 1.5B-parameter model that challenges the conventional belief that small models are limited in their reasoning capabilities. Developed through our innovative Spectrum-to-Signal Principle (SSP), this compact yet powerful model showcases superior reasoning performance compared to larger models like DeepSeek R1 and Kimi k2. Utilizing a Two-Stage Diversity-Exploring Distillation (SFT) followed by MaxEnt-Guided Policy Optimization (RL), VibeThinker-1.5B demonstrates exceptional reasoning abilities on challenging math benchmarks such as AIME24, AIME25, and HMMT25. In fact, it surpasses even the much larger DeepSeek R1 in these tasks while also significantly reducing training costs to only $7,800. Furthermore, on LiveCodeBench V6, VibeThinker-1.5B outperforms both Magistral Medium and its own base model by a significant margin. This highlights its prowess in specialized domains and complex reasoning tasks. By comparing VibeThinker-1.5B against a wide range of state-of-the-art models across different scales and categories - including advanced reasoning models with Long-CoT capabilities and top-tier non-reasoning models - we establish its remarkable performance in diverse scenarios. Our evaluation settings using vLLM for inference backend ensure accurate assessments of the model's performance metrics. Additionally, a recent arXiv query titled "Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B" further emphasizes the significance of our approach in eliciting large-model reasoning ability through diversity-driven optimization. Overall, our findings challenge existing Scaling Law assumptions by showcasing that small models like VibeThinker-1.5B can achieve remarkable reasoning capabilities comparable to larger counterparts while significantly reducing training and inference costs. This not only democratizes advanced AI research but also prompts a necessary re-evaluation of traditional scaling paradigms in the field of artificial intelligence.
Created on 12 Nov. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.