Gemma 3 Technical Report
Authors: Gemma Team, Aishwarya Kamath (Dima), Johan Ferret (Dima), Shreya Pathak (Dima), Nino Vieillard (Dima), Ramona Merhej (Dima), Sarah Perrin (Dima), Tatiana Matejovicova (Dima), Alexandre Ramé (Dima), Morgane Rivière (Dima), Louis Rouillard (Dima), Thomas Mesnard (Dima), Geoffrey Cideron (Dima), Jean-bastien Grill (Dima), Sabela Ramos (Dima), Edouard Yvinec (Dima), Michelle Casbon (Dima), Etienne Pot (Dima), Ivo Penchev (Dima), Gaël Liu (Dima), Francesco Visin (Dima), Kathleen Kenealy (Dima), Lucas Beyer (Dima), Xiaohai Zhai (Dima), Anton Tsitsulin (Dima), Robert Busa-Fekete (Dima), Alex Feng (Dima), Noveen Sachdeva (Dima), Benjamin Coleman (Dima), Yi Gao (Dima), Basil Mustafa (Dima), Iain Barr (Dima), Emilio Parisotto (Dima), David Tian (Dima), Matan Eyal (Dima), Colin Cherry (Dima), Jan-Thorsten Peter (Dima), Danila Sinopalnikov (Dima), Surya Bhupatiraju (Dima), Rishabh Agarwal (Dima), Mehran Kazemi (Dima), Dan Malkin (Dima), Ravin Kumar (Dima), David Vilar (Dima), Idan Brusilovsky (Dima), Jiaming Luo (Dima), Andreas Steiner (Dima), Abe Friesen (Dima), Abhanshu Sharma (Dima), Abheesht Sharma (Dima), Adi Mayrav Gilady (Dima), Adrian Goedeckemeyer (Dima), Alaa Saade (Dima)
Abstract: We introduce Gemma 3, a multimodal addition to the Gemma family of lightweight open models, ranging in scale from 1 to 27 billion parameters. This version introduces vision understanding abilities, a wider coverage of languages and longer context - at least 128K tokens. We also change the architecture of the model to reduce the KV-cache memory that tends to explode with long context. This is achieved by increasing the ratio of local to global attention layers, and keeping the span on local attention short. The Gemma 3 models are trained with distillation and achieve superior performance to Gemma 2 for both pre-trained and instruction finetuned versions. In particular, our novel post-training recipe significantly improves the math, chat, instruction-following and multilingual abilities, making Gemma3-4B-IT competitive with Gemma2-27B-IT and Gemma3-27B-IT comparable to Gemini-1.5-Pro across benchmarks. We release all our models to the community.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Some bits of the article are not summarized yet, you can re-run the summarizing process by clicking on the Run button below.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.