Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

AI-generated keywords: Infinite Retrieval

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors address limitations of context window size in Large Language Models (LLMs) for tasks with input tokens exceeding the upper limit
  • Challenges faced in tasks from simple direct retrieval to complex multi-hop reasoning due to constraints
  • Proposed method called InfiniRetri leverages LLMs' attention information for accurate retrieval across inputs of infinite length
  • Achieved 100% accuracy in Needle-In-a-Haystack test over 1 million tokens using a 0.5 billion parameter model, surpassing other methods and larger models
  • Significant performance improvements on real-world benchmarks, with a maximum enhancement of 288%
  • Method applicable to any Transformer-based LLM without additional training, reducing inference latency and compute overhead
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xiaoju Ye, Zhichun Wang, Jingyuan Wang

21 pages
License: CC BY-NC-ND 4.0

Abstract: Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvement in realistic tasks. Our work observes the correlation between the attention distribution and generated answers across each layer, and establishes the attention allocation aligns with retrieval-augmented capabilities through experiments. Drawing on the above insights, we propose a novel method InfiniRetri that leverages the LLMs's own attention information to enable accurate retrieval across inputs of infinitely length. Our evaluations indicate that InfiniRetri achieves 100% accuracy in the Needle-In-a-Haystack(NIH) test over 1M tokens using a 0.5B parameter model, surpassing other method or larger models and setting a new state-of-the-art(SOTA). Moreover, our method achieves significant performance improvements on real-world benchmarks, with a maximum 288% improvement. In addition, InfiniRetri can be applied to any Transformer-based LLMs without additional training and substantially reduces inference latency and compute overhead in long texts. In summary, our comprehensive studies show InfiniRetri's potential for practical applications and creates a paradigm for retrievaling information using LLMs own capabilities under infinite-length tokens. Code will be released in link.

Submitted to arXiv on 18 Feb. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2502.12962v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In the paper titled "Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing," authors Xiaoju Ye, Zhichun Wang, and Jingyuan Wang address the limitations posed by the context window size of Large Language Models (LLMs) when handling tasks with input tokens exceeding the upper limit. They highlight the challenges faced in tasks ranging from simple direct retrieval to complex multi-hop reasoning due to these constraints. Previous methods have attempted to enhance long-context processing capabilities, but often come with post-training costs or require additional tool modules like RAG. However, these methods have not shown significant improvements in realistic tasks. To address this issue, the authors conduct experiments that reveal a correlation between attention distribution and generated answers across each layer of LLMs. They establish that attention allocation aligns with retrieval-augmented capabilities, leading them to propose a novel method called InfiniRetri. This method leverages LLMs' own attention information to enable accurate retrieval across inputs of infinite length. Evaluations demonstrate that InfiniRetri achieves 100% accuracy in the Needle-In-a-Haystack (NIH) test over 1 million tokens using a 0.5 billion parameter model, surpassing other methods and larger models to set a new state-of-the-art. Furthermore, InfiniRetri shows significant performance improvements on real-world benchmarks, with a maximum enhancement of 288%. The method can be applied to any Transformer-based LLM without additional training and reduces inference latency and compute overhead in long texts. The comprehensive studies conducted by the authors showcase InfiniRetri's potential for practical applications and establish a paradigm for retrieving information using LLMs' own capabilities under infinitely long tokens. The authors also mention that the code for their method will be released through a provided link. in long-context processing within LLMs can be overcome with innovative approaches, as demonstrated by . This method shows promising results for enhancing retrieval capabilities in , such as simple direct retrieval and complex multi-hop reasoning. It leverages LLMs' own attention information to enable accurate retrieval across inputs of infinite length, without requiring additional training or tool modules. The comprehensive studies conducted by the authors showcase InfiniRetri's potential for practical applications and establish a paradigm for retrieving information using LLMs' own capabilities under infinitely long tokens.
Created on 03 Mar. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.