Extending Llama-3's Context Ten-Fold Overnight

AI-generated keywords: LLMAs

AI-generated Key Points

  • Successfully extended context length of Llama-3-8B-Instruct from 8K to 80K through QLoRA fine-tuning
  • Explored Multi-Detail QA tasks with homogeneous and heterogeneous contexts, as well as Biography Summarization tasks with context lengths between 64K to 80K
  • Training dataset included question-answer pairs from multi-turn conversations and instances from RedPajama and LongAlpaca datasets
  • Model fine-tuned using QLoRA with LoRA rank set to 32 and alpha to 16, achieving remarkable results on downstream long-context tasks
  • Released resources, including model, training data, and code, are publicly available for further research in training long-context LLMs
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Peitian Zhang, Ninglu Shao, Zheng Liu, Shitao Xiao, Hongjin Qian, Qiwei Ye, Zhicheng Dou

License: CC BY 4.0

Abstract: We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a broad range of evaluation tasks, such as NIHS, topic retrieval, and long-context language understanding; meanwhile, it also well preserves the original capability over short contexts. The dramatic context extension is mainly attributed to merely 3.5K synthetic training samples generated by GPT-4 , which indicates the LLMs' inherent (yet largely underestimated) potential to extend its original context length. In fact, the context length could be extended far beyond 80K with more computation resources. Therefore, the team will publicly release the entire resources (including data, model, data generation pipeline, training code) so as to facilitate the future research from the community: \url{https://github.com/FlagOpen/FlagEmbedding}.

Submitted to arXiv on 30 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.19553v1

, , , , The team successfully extended the context length of Llama-3-8B-Instruct from 8K to 80K through QLoRA fine-tuning, resulting in superior performance across various evaluation tasks. They explored Multi-Detail QA tasks involving homogeneous and heterogeneous contexts, as well as Biography Summarization tasks with context lengths between 64K to 80K. The training dataset included question-answer pairs organized in multi-turn conversations and instances from RedPajama and LongAlpaca datasets. The model was fine-tuned using QLoRA with LoRA rank set to 32 and alpha to 16, achieving remarkable results on downstream long-context tasks. The released resources, including the model, training data, and code, are publicly available for further research in training long-context LLMs. Additionally, the model was evaluated on LongBench and InfiniteBench benchmarks, showcasing consistent outperformance compared to baselines except for code completion tasks. Further improvements may involve mixing more code data during training.
Created on 30 May. 2024

Assess the quality of the AI-generated content by voting

Score: -1

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.