LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

AI-generated keywords: LongRoPE Large Language Models Context Window Extension Fine-tuning Performance Levels

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • LongRoPE is a groundbreaking advancement in large language models (LLMs) that extends the context window to an impressive 2048k tokens.
  • The extension is achieved with minimal fine-tuning steps, only up to 1k within 256k training lengths, while maintaining performance levels comparable to shorter context windows.
  • Three key innovations of LongRoPE include identifying and leveraging non-uniformities in positional interpolation, introducing a progressive extension strategy, and readjusting on an 8k length to restore performance levels associated with shorter context windows.
  • Extensive experiments on LLaMA2 and Mistral show the effectiveness of LongRoPE in enhancing language understanding and model capabilities.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang

License: CC BY-NC-ND 4.0

Abstract: Large context window is a desirable feature in large language models (LLMs). However, due to high fine-tuning costs, scarcity of long texts, and catastrophic values introduced by new token positions, current extended context windows are limited to around 128k tokens. This paper introduces LongRoPE that, for the first time, extends the context window of pre-trained LLMs to an impressive 2048k tokens, with up to only 1k fine-tuning steps at within 256k training lengths, while maintaining performance at the original short context window. This is achieved by three key innovations: (i) we identify and exploit two forms of non-uniformities in positional interpolation through an efficient search, providing a better initialization for fine-tuning and enabling an 8x extension in non-fine-tuning scenarios; (ii) we introduce a progressive extension strategy that first fine-tunes a 256k length LLM and then conducts a second positional interpolation on the fine-tuned extended LLM to achieve a 2048k context window; (iii) we readjust LongRoPE on 8k length to recover the short context window performance. Extensive experiments on LLaMA2 and Mistral across various tasks demonstrate the effectiveness of our method. Models extended via LongRoPE retain the original architecture with minor modifications to the positional embedding, and can reuse most pre-existing optimizations.

Submitted to arXiv on 21 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.13753v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

LongRoPE is a groundbreaking advancement in the realm of large language models (LLMs), aiming to address the limitations imposed by current extended context windows. The significance of a large context window in LLMs cannot be overstated, as it allows for a more comprehensive understanding of language and improves model performance. However, existing models are constrained by factors such as high fine-tuning costs, limited availability of long texts, and issues arising from new token positions. In this paper authored by Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, and Mao Yang,<br> LongRoPE introduces a novel approach that pushes the boundaries of context window extension. Unlike previous models capped at around 128k tokens,<br> LongRoPE extends the context window to an impressive 2048k tokens. This extension is achieved with minimal fine-tuning steps—only up to 1k within 256k training lengths—while still maintaining performance levels comparable to those achieved with shorter context windows. The success of LongRoPE hinges on three key innovations. Firstly,<br> the authors identify and leverage two forms of non-uniformities in positional interpolation through an efficient search process.<br> This not only provides a better initialization for fine-tuning but also enables an eightfold extension in scenarios where fine-tuning is not required.<br> Secondly,<br> they introduce a progressive extension strategy wherein a 256k length LLM is first fine-tuned before undergoing a second positional interpolation to reach the desired 2048k context window.<br> Lastly,<br> they readjust LongRoPE on an 8k length to restore performance levels associated with shorter context windows. Extensive experiments conducted on LLaMA2 and Mistral across various tasks showcase the effectiveness of LongRoPE. Models extended using this approach retain their original architecture with minor adjustments made to positional embedding while being able to capitalize on pre-existing optimizations. In conclusion, LongRoPE represents a significant leap forward in extending LLM context windows beyond previous limitations. By enabling models to process information from up to 2 million tokens while maintaining performance standards and minimizing fine-tuning requirements, this innovative approach opens up new possibilities for enhancing language understanding and model capabilities in diverse applications.
Created on 23 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.