Qwen2.5 Technical Report

Résumés déjà disponibles dans d'autres langues : en

Auteurs : Qwen (additional authors not shown), : (additional authors not shown), An Yang (additional authors not shown), Baosong Yang (additional authors not shown), Beichen Zhang (additional authors not shown), Binyuan Hui (additional authors not shown), Bo Zheng (additional authors not shown), Bowen Yu (additional authors not shown), Chengyuan Li (additional authors not shown), Dayiheng Liu (additional authors not shown), Fei Huang (additional authors not shown), Haoran Wei (additional authors not shown), Huan Lin (additional authors not shown), Jian Yang (additional authors not shown), Jianhong Tu (additional authors not shown), Jianwei Zhang (additional authors not shown), Jianxin Yang (additional authors not shown), Jiaxi Yang (additional authors not shown), Jingren Zhou (additional authors not shown), Junyang Lin (additional authors not shown), Kai Dang (additional authors not shown), Keming Lu (additional authors not shown), Keqin Bao (additional authors not shown), Kexin Yang (additional authors not shown), Le Yu (additional authors not shown), Mei Li (additional authors not shown), Mingfeng Xue (additional authors not shown), Pei Zhang (additional authors not shown), Qin Zhu (additional authors not shown), Rui Men (additional authors not shown), Runji Lin (additional authors not shown), Tianhao Li (additional authors not shown), Tingyu Xia (additional authors not shown), Xingzhang Ren (additional authors not shown), Xuancheng Ren (additional authors not shown), Yang Fan (additional authors not shown), Yang Su (additional authors not shown), Yichang Zhang (additional authors not shown), Yu Wan (additional authors not shown), Yuqiong Liu (additional authors not shown), Zeyu Cui (additional authors not shown), Zhenru Zhang (additional authors not shown), Zihan Qiu (additional authors not shown)

arXiv: 2412.15115v1 - DOI (cs.CL)

Licence : NONEXCLUSIVE-DISTRIB 1.0

Résumé : In this report, we introduce Qwen2.5, a comprehensive series of large language models (LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has been significantly improved during both the pre-training and post-training stages. In terms of pre-training, we have scaled the high-quality pre-training datasets from the previous 7 trillion tokens to 18 trillion tokens. This provides a strong foundation for common sense, expert knowledge, and reasoning capabilities. In terms of post-training, we implement intricate supervised finetuning with over 1 million samples, as well as multistage reinforcement learning. Post-training techniques enhance human preference, and notably improve long text generation, structural data analysis, and instruction following. To handle diverse and varied use cases effectively, we present Qwen2.5 LLM series in rich sizes. Open-weight offerings include base and instruction-tuned models, with quantized versions available. In addition, for hosted solutions, the proprietary models currently include two mixture-of-experts (MoE) variants: Qwen2.5-Turbo and Qwen2.5-Plus, both available from Alibaba Cloud Model Studio. Qwen2.5 has demonstrated top-tier performance on a wide range of benchmarks evaluating language understanding, reasoning, mathematics, coding, human preference alignment, etc. Specifically, the open-weight flagship Qwen2.5-72B-Instruct outperforms a number of open and proprietary models and demonstrates competitive performance to the state-of-the-art open-weight model, Llama-3-405B-Instruct, which is around 5 times larger. Qwen2.5-Turbo and Qwen2.5-Plus offer superior cost-effectiveness while performing competitively against GPT-4o-mini and GPT-4o respectively. Additionally, as the foundation, Qwen2.5 models have been instrumental in training specialized models such as Qwen2.5-Math, Qwen2.5-Coder, QwQ, and multimodal models.

Soumis à arXiv le 19 Déc. 2024

Posez des questions sur cet article à notre assistant IA

Vous pouvez aussi discutez avec plusieurs papiers à la fois ici.

⚠La licence de l'article ne nous permet pas de nous appuyer sur son contenu et l'assistant IA ne peut se servir que des métadonnées de l'article plutôt que de l'article complet.

Instructions pour utiliser l'assistant IA ?

Résultats du processus de synthèse de l'article arXiv : 2412.15115v1

⚠La licence de cet article ne nous permet pas de nous appuyer sur son contenu et le processus de synthèse est ici effectué avec les métadonnées de l'article plutôt qu'avec l'article en tant que tel.

Résumé Complet
Points clés
Résumé vulgarisé
Article de blog

Le résumé n'est pas encore prêt

Les points clés ne sont pas encore prêts

Le résumé vulgarisé n'est pas encore prêt

L'article de blog n'est pas encore prêt

Créé le 23 Déc. 2024

Disponible dans d'autres langues : en

Évaluez la qualité du contenu généré par l'IA en votant

Note : 0

Le résumé précédent a été créé il y a plus d'un an et peut être réexécuté (si nécessaire) en cliquant sur le bouton Exécuter ci-dessous.

⚠La licence de cet article spécifique ne nous permet pas de nous appuyer sur son contenu et les outils de synthèse seront exécutés en utilisant les métadonnées de l'article plutôt que l'article complet. Cependant, l'outil produira quand même un bon résultat, et vous pouvez également essayer nos outils sur des papiers avec des licences plus ouvertes.

Qwen2.5 Technical Report

Posez des questions sur cet article à notre assistant IA

Résultats du processus de synthèse de l'article arXiv : 2412.15115v1

Articles similaires résumés avec nos outils d'IA