Qwen3 Technical Report

AI-generated keywords: Qwen3

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Qwen3 is a significant advancement in large language models (LLMs) with a focus on performance, efficiency, and multilingual capabilities.
Qwen3 includes models with dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to an impressive 235 billion.
Qwen3 integrates thinking mode for complex reasoning tasks and non-thinking mode for rapid context-driven responses within a unified framework, enabling dynamic mode switching based on user queries or chat templates.
Qwen3 introduces a thinking budget mechanism that allows adaptive allocation of computational resources during inference to balance latency and performance based on task complexity.
Qwen3 achieves state-of-the-art results across diverse benchmarks including code generation, mathematical reasoning, agent tasks, competes favorably against larger MoE models and proprietary models in various domains.
One key enhancement in Qwen3 is the expansion of multilingual support from 29 languages to 119 languages, enhancing global accessibility through improved cross-lingual understanding and generation capabilities.
All models within the Qwen3 series are publicly accessible under Apache 2.0 licensing terms to promote reproducibility and foster community-driven research efforts.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin Yang, Le Yu, Lianghao Deng, Mei Li, Mingfeng Xue, Mingze Li, Pei Zhang, Peng Wang, Qin Zhu, Rui Men, Ruize Gao, Shixuan Liu, Shuang Luo, Tianhao Li, Tianyi Tang, Wenbiao Yin, Xingzhang Ren, Xinyu Wang, Xinyu Zhang, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yinger Zhang, Yu Wan, Yuqiong Liu, Zekun Wang, Zeyu Cui, Zhenru Zhang, Zhipeng Zhou, Zihan Qiu

arXiv: 2505.09388v1 - DOI (cs.CL)

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration of thinking mode (for complex, multi-step reasoning) and non-thinking mode (for rapid, context-driven responses) into a unified framework. This eliminates the need to switch between different models--such as chat-optimized models (e.g., GPT-4o) and dedicated reasoning models (e.g., QwQ-32B)--and enables dynamic mode switching based on user queries or chat templates. Meanwhile, Qwen3 introduces a thinking budget mechanism, allowing users to allocate computational resources adaptively during inference, thereby balancing latency and performance based on task complexity. Moreover, by leveraging the knowledge from the flagship models, we significantly reduce the computational resources required to build smaller-scale models, while ensuring their highly competitive performance. Empirical evaluations demonstrate that Qwen3 achieves state-of-the-art results across diverse benchmarks, including tasks in code generation, mathematical reasoning, agent tasks, etc., competitive against larger MoE models and proprietary models. Compared to its predecessor Qwen2.5, Qwen3 expands multilingual support from 29 to 119 languages and dialects, enhancing global accessibility through improved cross-lingual understanding and generation capabilities. To facilitate reproducibility and community-driven research and development, all Qwen3 models are publicly accessible under Apache 2.0.

Submitted to arXiv on 14 May. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2505.09388v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , In the Qwen3 Technical Report, a team led by An Yang and including researchers such as Anfeng Li, Baosong Yang, Beichen Zhang, and many others introduces the latest iteration of the Qwen model family. Qwen3 is a significant advancement in large language models (LLMs), with a focus on enhancing performance, efficiency, and multilingual capabilities. The series includes models with both dense and Mixture-of-Expert (MoE) architectures, featuring parameter scales ranging from 0.6 to an impressive 235 billion. One notable innovation in Qwen3 is the integration of thinking mode for complex, multi-step reasoning tasks and non-thinking mode for rapid context-driven responses within a unified framework. This eliminates the need to switch between different models optimized for specific tasks like chat interactions or dedicated reasoning processes. Instead, dynamic mode switching based on user queries or chat templates is enabled. Additionally, Qwen3 introduces a thinking budget mechanism that allows users to allocate computational resources adaptively during inference to balance latency and performance based on task complexity. By leveraging knowledge from flagship models, the team behind Qwen3 has significantly reduced the computational resources required to build smaller-scale models while maintaining highly competitive performance levels. Empirical evaluations demonstrate that Qwen3 achieves state-of-the-art results across diverse benchmarks including code generation, mathematical reasoning, agent tasks, among others. It competes favorably against larger MoE models and proprietary models in various domains. Compared to its predecessor Qwen2.5, one of the key enhancements in Qwen3 is the expansion of multilingual support from 29 languages and dialects to an impressive 119 languages. This expansion enhances global accessibility through improved cross-lingual understanding and generation capabilities. To promote reproducibility and foster community-driven research and development efforts, all models within the Qwen3 series are made publicly accessible under Apache 2.0 licensing terms. The collaborative effort showcased in this technical report underscores a commitment to advancing language modeling capabilities while ensuring broad accessibility for researchers worldwide.

- Qwen3 is a significant advancement in large language models (LLMs) with a focus on performance, efficiency, and multilingual capabilities.
- Qwen3 includes models with dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to an impressive 235 billion.
- Qwen3 integrates thinking mode for complex reasoning tasks and non-thinking mode for rapid context-driven responses within a unified framework, enabling dynamic mode switching based on user queries or chat templates.
- Qwen3 introduces a thinking budget mechanism that allows adaptive allocation of computational resources during inference to balance latency and performance based on task complexity.
- Qwen3 achieves state-of-the-art results across diverse benchmarks including code generation, mathematical reasoning, agent tasks, competes favorably against larger MoE models and proprietary models in various domains.
- One key enhancement in Qwen3 is the expansion of multilingual support from 29 languages to 119 languages, enhancing global accessibility through improved cross-lingual understanding and generation capabilities.
- All models within the Qwen3 series are publicly accessible under Apache 2.0 licensing terms to promote reproducibility and foster community-driven research efforts.

SummaryQwen3 is a new and improved type of computer program that can understand and use many different languages. It can think deeply for hard problems and quickly for simple questions. Qwen3 can switch between these modes based on what it needs to do. It also manages its resources smartly to work efficiently. Qwen3 does very well in tests compared to other similar programs. Definitions- Advancement: A step forward or improvement in something. - Language models: Programs that help computers understand and generate human language. - Efficiency: Doing things well without wasting time or resources. - Multilingual: Able to work with multiple languages. - Framework: A structure or system that helps organize and guide something. - Latency: The time delay between a request and a response in computing. - State-of-the-art: The most advanced or current level of development in a particular field. - Accessibility: How easy it is for people to use or access something. - Reproducibility: The ability to repeat an experiment or process and get the same results.

Introduction

In recent years, large language models (LLMs) have gained significant attention in the field of natural language processing (NLP). These models are trained on massive amounts of text data and can generate human-like text responses to various prompts. One such model family is Qwen, which has been continuously evolving and improving since its initial release. In the Qwen3 Technical Report, a team led by An Yang introduces the latest iteration of this model series, showcasing significant advancements in performance, efficiency, and multilingual capabilities.

The Qwen Model Family

The Qwen model family includes both dense and Mixture-of-Expert (MoE) architectures with parameter scales ranging from 0.6 billion to an impressive 235 billion. The previous version, Qwen2.5, had already achieved state-of-the-art results across various NLP benchmarks. However, the team behind Qwen3 has further enhanced its capabilities through innovative features such as thinking mode and thinking budget mechanism.

Thinking Mode

One of the key innovations in Qwen3 is the integration of thinking mode for complex reasoning tasks and non-thinking mode for rapid context-driven responses within a unified framework. This eliminates the need to switch between different models optimized for specific tasks like chat interactions or dedicated reasoning processes. With thinking mode enabled, users can input multi-step queries that require more comprehensive understanding and reasoning abilities from the model. On the other hand, non-thinking mode is suitable for simpler tasks where quick responses based on contextual information are sufficient.

Thinking Budget Mechanism

Another notable feature introduced in Qwen3 is the thinking budget mechanism that allows users to allocate computational resources adaptively during inference based on task complexity. This means that for simpler queries or tasks requiring faster response times, fewer resources will be allocated to reduce latency without compromising performance significantly. This dynamic resource allocation is a crucial step towards making large language models more practical for real-world applications. It also highlights the team's commitment to improving efficiency and reducing computational costs.

Multilingual Support

Qwen3 has significantly expanded its multilingual capabilities, with support for 119 languages compared to Qwen2.5's 29 languages and dialects. This expansion not only enhances global accessibility but also improves cross-lingual understanding and generation capabilities. With the increasing demand for NLP solutions in various languages, this enhancement makes Qwen3 a highly valuable tool for researchers worldwide. It also showcases the team's efforts towards promoting diversity and inclusivity in language modeling research.

Evaluation Results

Empirical evaluations of Qwen3 demonstrate its state-of-the-art performance across diverse benchmarks such as code generation, mathematical reasoning, agent tasks, among others. The model competes favorably against larger MoE models and proprietary models in various domains, showcasing its versatility and effectiveness. Compared to its predecessor Qwen2.5, Qwen3 has shown significant improvements in performance while maintaining similar or even reduced computational requirements. This achievement is a testament to the team's dedication to advancing language modeling capabilities while ensuring broad accessibility for researchers worldwide.

Open-source Availability

To promote reproducibility and foster community-driven research and development efforts, all models within the Qwen3 series are made publicly accessible under Apache 2.0 licensing terms. This open-source approach allows other researchers to build upon the work done by the team behind Qwen3 and further advance language modeling capabilities. This collaborative effort underscores a commitment to driving progress in NLP through shared knowledge and resources rather than individual achievements.

Conclusion

In conclusion, the Qwen3 Technical Report introduces an impressive iteration of the Qwen model family with significant advancements in performance, efficiency, and multilingual capabilities. The integration of thinking mode, thinking budget mechanism, and expanded multilingual support make Qwen3 a highly versatile and practical tool for various NLP tasks. The team's efforts towards promoting diversity, inclusivity, and open-source availability in language modeling research are commendable. With Qwen3's release, the field of NLP takes another step forward towards more advanced and accessible language models.

Created on 13 Oct. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

89.2%

Qwen2.5 Technical Report

cs.CL

86.1%

Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Impr…

cs.CL

85.0%

Qwen2.5-Coder Technical Report

cs.CL

84.8%

Qwen2.5-1M Technical Report

cs.CL

82.4%

Qwen Technical Report

cs.CL

71.8%

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

cs.CL

71.3%

WebGPT: Browser-assisted question-answering with human feedback

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.