Qwen2.5-1M Technical Report

AI-generated keywords: Qwen2.5-1M series long-context modeling open-source technology innovative techniques inference framework

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Introduction of Qwen2.5-1M series, a significant advancement in long-context modeling
  • Utilization of innovative techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning to enhance model capabilities
  • Open-sourcing of inference framework with length extrapolation method for expanding context lengths without additional training
  • Implementation of sparse attention methods and chunked prefill optimization for reduced inference costs while maintaining precision
  • Optimizations in the inference engine including kernel optimization, pipeline parallelism, and scheduling optimization leading to improved performance
  • Achieving 3x to 7x prefill speedup with the Qwen2.5-1M models in scenarios with 1 million tokens of context
  • Outperformance of GPT-4o-mini by Qwen2.5-14B-Instruct-1M model in long-context tasks supporting contexts eight times longer
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, Junyang Lin, Kai Dang, Kexin Yang, Le Yu, Mei Li, Minmin Sun, Qin Zhu, Rui Men, Tao He, Weijia Xu, Wenbiao Yin, Wenyuan Yu, Xiafei Qiu, Xingzhang Ren, Xinlong Yang, Yong Li, Zhiying Xu, Zipeng Zhang

Abstract: We introduce Qwen2.5-1M, a series of models that extend the context length to 1 million tokens. Compared to the previous 128K version, the Qwen2.5-1M series have significantly enhanced long-context capabilities through long-context pre-training and post-training. Key techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning are employed to effectively enhance long-context performance while reducing training costs. To promote the use of long-context models among a broader user base, we present and open-source our inference framework. This framework includes a length extrapolation method that can expand the model context lengths by at least four times, or even more, without additional training. To reduce inference costs, we implement a sparse attention method along with chunked prefill optimization for deployment scenarios and a sparsity refinement method to improve precision. Additionally, we detail our optimizations in the inference engine, including kernel optimization, pipeline parallelism, and scheduling optimization, which significantly enhance overall inference performance. By leveraging our inference framework, the Qwen2.5-1M models achieve a remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. This framework provides an efficient and powerful solution for developing applications that require long-context processing using open-source models. The Qwen2.5-1M series currently includes the open-source models Qwen2.5-7B-Instruct-1M and Qwen2.5-14B-Instruct-1M, as well as the API-accessed model Qwen2.5-Turbo. Evaluations show that Qwen2.5-1M models have been greatly improved in long-context tasks without compromising performance in short-context scenarios. Specifically, the Qwen2.5-14B-Instruct-1M model significantly outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer.

Submitted to arXiv on 26 Jan. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2501.15383v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

We are proud to introduce the Qwen2.5-1M series - a groundbreaking advancement in long-context modeling. These models have extended context length to an impressive 1 million tokens and include Qwen2.5-7B-Instruct-1M, Qwen2.5-14B-Instruct-1M, and the API-accessed model Qwen2.5-Turbo. Our team has utilized innovative techniques such as long data synthesis, progressive pre-training, and multi-stage supervised fine-tuning to significantly enhance the long-context capabilities of these models. To promote wider adoption of long-context models, we have open-sourced our inference framework which includes a length extrapolation method that expands model context lengths by at least four times without additional training. In addition to this, we have implemented sparse attention methods and chunked prefill optimization for deployment scenarios - reducing inference costs while maintaining precision. Our optimizations in the inference engine - including kernel optimization, pipeline parallelism, and scheduling optimization - have resulted in significant improvements in overall inference performance. Leveraging our framework has enabled the Qwen2.5-1M models to achieve remarkable 3x to 7x prefill speedup in scenarios with 1 million tokens of context. Evaluations have shown that these models excel in long-context tasks without sacrificing performance in short-context scenarios. Notably, the Qwen2.5-14B-Instruct-1M model outperforms GPT-4o-mini in long-context tasks and supports contexts eight times longer. The collaborative efforts of An Yang, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoyan Huang, Jiandong Jiang, Jianhong Tu, Jianwei Zhang, Jingren Zhou, Junyang Lin, Kai Dang, Kexin Yang, Le Yu, Mei Li, Minmin Sun, Qin Zhu, Rui Men, Tao He and Weijia Xu have led to the development of these cutting-edge models that redefine what is possible in long-context processing using open-source technology.
Created on 07 Apr. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.