Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

AI-generated keywords: Phi-3 Technical Report phi-3-mini groundbreaking language model training dataset

AI-generated Key Points

Phi-3-mini is a 3.8 billion parameter language model trained on a dataset of 3.3 trillion tokens
Phi-3-mini competes with leading models like Mixtral 8x7B and GPT-3.5, achieving scores of 69% on MMLU and 8.38 on MT-bench
Key innovation of phi-3-mini is its training dataset, which enhances robustness, safety features, and chat format capabilities
Introduction of larger models phi-3-small and phi-3-medium trained on 4.8 trillion tokens each with enhanced capabilities, scoring 75% and 78% on MMLU respectively
Team includes talented individuals like Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan among others pushing boundaries in language modeling technology

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Olatunji Ruwase, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yunan Zhang, Xiren Zhou

arXiv: 2404.14219v1 - DOI (cs.CL)

12 pages

License: CC BY 4.0

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench).

Submitted to arXiv on 22 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.14219v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

The Phi-3 Technical Report introduces the phi-3-mini, a groundbreaking 3.8 billion parameter language model trained on an extensive dataset of 3.3 trillion tokens. This model's exceptional performance has been validated by both academic benchmarks and internal evaluations, placing it in direct competition with leading models such as Mixtral 8x7B and GPT-3.5. Impressively, the phi-3-mini achieves remarkable scores of 69% on MMLU and 8.38 on MT-bench while maintaining a compact size suitable for deployment on mobile devices. A key innovation of the phi-3-mini lies in its training dataset, which is an enhanced version of the one utilized for its predecessor, the phi-2 model. This meticulously filtered web data and synthetic information contribute to the model's robustness, safety features, and optimized chat format capabilities. Furthermore, the report delves into additional advancements with the introduction of two larger models: phi-3-small and phi-3-medium. These models are trained on a staggering 4.8 trillion tokens each and exhibit significantly enhanced capabilities compared to the phi-3-mini. For instance, they achieve impressive scores of 75% and 78% on MMLU respectively while scoring 8.7 and 8.9 on MT-bench. The team behind this groundbreaking research includes a diverse group of talented individuals such as Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla among others who have collectively contributed to pushing the boundaries of language modeling technology. In conclusion, showcases a new era in language modeling technology with its innovative approach to training datasets and cutting-edge models that promise to revolutionize natural language processing capabilities across various applications.

- Phi-3-mini is a 3.8 billion parameter language model trained on a dataset of 3.3 trillion tokens
- Phi-3-mini competes with leading models like Mixtral 8x7B and GPT-3.5, achieving scores of 69% on MMLU and 8.38 on MT-bench
- Key innovation of phi-3-mini is its training dataset, which enhances robustness, safety features, and chat format capabilities
- Introduction of larger models phi-3-small and phi-3-medium trained on 4.8 trillion tokens each with enhanced capabilities, scoring 75% and 78% on MMLU respectively
- Team includes talented individuals like Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan among others pushing boundaries in language modeling technology

Summary- Phi-3-mini is a big smart model trained on lots of words. - It is as good as other top models and does well on tests. - It's special because it learned from a lot of different words to be strong, safe, and good at chatting. - There are even bigger models called phi-3-small and phi-3-medium that are even better at understanding words. - A team of smart people like Marah Abdin and Sam Ade Jacobs are working hard to make these models better. Definitions1. Parameter: A setting or value that helps the model learn and make decisions. 2. Dataset: A collection of information or data used to train the model. 3. Tokens: Individual units of words or characters in the dataset used for training. 4. Robustness: The ability to perform well in various situations and handle challenges effectively. 5. Capabilities: Skills or abilities that the model possesses to perform tasks effectively.

The Phi-3 Technical Report: A New Era in Language Modeling Technology Language modeling has been a crucial area of research in the field of natural language processing (NLP) for decades. It involves training algorithms to understand and generate human language, with the goal of creating more intelligent and efficient communication systems. In recent years, there have been significant advancements in this field, with models such as GPT-3 and Mixtral 8x7B achieving impressive results. However, a new player has entered the game – the phi-3-mini. In their groundbreaking technical report, titled "Phi-3: Pushing the Boundaries of Language Modeling," Marah Abdin and her team introduce the phi-3-mini – a 3.8 billion parameter language model trained on an extensive dataset of 3.3 trillion tokens. This model's exceptional performance has been validated by both academic benchmarks and internal evaluations, placing it in direct competition with leading models such as Mixtral 8x7B and GPT-3. Impressively, the phi-3-mini achieves remarkable scores of 69% on MMLU (Multilingual Multi-Level Unsupervised Evaluation) and 8.38 on MT-bench (Machine Translation Benchmark) while maintaining a compact size suitable for deployment on mobile devices. This makes it not only powerful but also practical for real-world applications. One key innovation of the phi-3-mini lies in its training dataset, which is an enhanced version of the one utilized for its predecessor, the phi-2 model. The team meticulously filtered web data and synthetic information to create a robust dataset that contributes to the model's overall performance. This approach also ensures safety features are built into the model while optimizing its chat format capabilities. But that's not all – in addition to introducing the phi-3-mini, this report also delves into two larger models: phi-3-small and phi-3-medium. These models are trained on a staggering 4.8 trillion tokens each, making them the largest language models to date. And with size comes power – these models exhibit significantly enhanced capabilities compared to the phi-3-mini. For instance, the phi-3-small achieves an impressive score of 75% on MMLU while scoring 8.7 on MT-bench. The larger model, phi-3-medium, takes it even further with a score of 78% on MMLU and 8.9 on MT-bench. These results demonstrate the potential for even more advanced language modeling technology in the future. The team behind this groundbreaking research includes a diverse group of talented individuals such as Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla among others who have collectively contributed to pushing the boundaries of language modeling technology. In conclusion, "Phi-3: Pushing the Boundaries of Language Modeling" showcases a new era in language modeling technology with its innovative approach to training datasets and cutting-edge models that promise to revolutionize natural language processing capabilities across various applications. With its exceptional performance and compact size, the phi-3-mini is set to make waves in NLP research and real-world applications alike. We can't wait to see what else this team has in store for us in the future!

Created on 07 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

67.4%

Textbooks Are All You Need II: phi-1.5 technical report

cs.CL

58.6%

Effective Long-Context Scaling of Foundation Models

cs.CL

58.1%

Retrieval meets Long Context Large Language Models

cs.CL

57.8%

LLM-powered Data Augmentation for Enhanced Crosslingual Performance

cs.CL

57.5%

Large Language Models: A Survey

cs.CL

57.2%

GLM-130B: An Open Bilingual Pre-trained Model

cs.CL

56.9%

Document-Level Machine Translation with Large Language Models

cs.CL

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.