Uncertainty-Aware Hybrid Retrieval for Long-Document RAG

AI-generated keywords: Retrieval augmented generation Quality Granularity Uncertainty-aware Multi-Granularity RAG (UMG-RAG) UMGP-RAG

AI-generated Key Points

Quality and granularity of retrieved evidence are crucial in retrieval augmented generation (RAG)
Large retrieval units provide contextual richness but may include irrelevant content
Fine-grained units are concise but can pose challenges in reliable retrieval
A novel training-free hybrid retrieval framework leverages chunk granularity for query-specific reliability estimation
The framework utilizes existing dense and sparse retrievers as complementary experts across various chunk granularities
It transforms expert-granularity score lists into an evidence distribution, assesses reliability based on distribution entropy, and merges candidates considering query-specific factors
An extension employs fine-grained hits to pinpoint relevant evidence while returning broader non-redundant parent chunks for enhanced local coherence during generation
Experiments show improved generation quality with uncertainty-aware fusion and parent promotion techniques in long-document RAG settings involving multiple retrievers and generators
The framework formalizes a tradeoff in retrieval granularity for long-document RAG scenarios and provides a solution that estimates query-specific reliability without extensive training
Evaluation against existing benchmarks demonstrates efficacy in enhancing generation quality through uncertainty-aware fusion and parent promotion strategies

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Hoin Jung, Xiaoqian Wang

arXiv: 2606.13550v1 - DOI (cs.AI)

License: CC BY 4.0

Abstract: Retrieval augmented generation (RAG) depends critically on the quality and granularity of retrieved evidence. Large retrieval units preserve context but often introduce irrelevant content, which can dilute answer bearing evidence and worsen long context utilization. Fine-grained units are more compact, but they may be difficult to retrieve reliably because short chunks can lack semantic, lexical, or bridging cues needed to match the query. We propose Uncertainty-aware Multi-Granularity RAG (UMG-RAG), a training-free hybrid retrieval framework that treats chunk granularity as query-specific reliability estimation. Instead of training a new retriever or modifying the generator, UMG-RAG uses existing dense and sparse retrievers as complementary experts across multiple chunk granularities. For each query, it converts each expert-granularity score list into an evidence distribution, estimates reliability from distribution entropy, and fuses candidates according to query-specific semantic, lexical, and granularity confidence. We further introduce UMGP-RAG, a parent promotion variant that uses fine-grained hits to locate relevant evidence while returning broader non-redundant parent chunks for local coherence. Experiments on question answering benchmarks show that uncertainty-aware fusion and parent promotion improve generation quality while maintaining a lightweight, plug-and-play retrieval pipeline.

Submitted to arXiv on 11 Jun. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2606.13550v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the realm of retrieval augmented generation (RAG), the quality and granularity of retrieved evidence play a pivotal role. Large retrieval units offer contextual richness but can also bring in irrelevant content that dilutes crucial answer-bearing evidence and hinders effective utilization of long contexts. On the other hand, fine-grained units are more concise but may pose challenges in reliable retrieval due to potential lack of semantic, lexical, or bridging cues necessary for query matching. To address these challenges, we introduce , a novel training-free hybrid retrieval framework that leverages chunk granularity as a query-specific reliability estimation. Instead of developing new retrievers or modifying generators, utilizes existing dense and sparse retrievers as complementary experts across various chunk granularities. For each query, it transforms expert-granularity score lists into an evidence distribution, assesses reliability based on distribution entropy, and merges candidates considering query-specific semantic, lexical, and granularity confidence. Furthermore, we present , an extension that employs fine-grained hits to pinpoint relevant evidence while returning broader non-redundant parent chunks for enhanced local coherence during generation. Through experiments conducted on question answering benchmarks within long-document RAG settings involving multiple dense retrievers and generators, our uncertainty-aware fusion approach and parent promotion technique demonstrate improved generation quality while maintaining a lightweight and adaptable retrieval pipeline. Additionally, our contributions include formalizing a tradeoff in retrieval granularity for long-document RAG scenarios and proposing as a solution that estimates query-specific reliability for each expert-granularity pair without the need for extensive training. We evaluate our methods against existing benchmarks to showcase their efficacy in enhancing generation quality through uncertainty-aware fusion and parent promotion strategies. Furthermore, we discuss related work focusing on interventions for addressing "lost in the middle" issues in language models within long prompts and highlight the significance of retrieval granularity considerations in hybrid approaches within RAG frameworks.

- Quality and granularity of retrieved evidence are crucial in retrieval augmented generation (RAG)
- Large retrieval units provide contextual richness but may include irrelevant content
- Fine-grained units are concise but can pose challenges in reliable retrieval
- A novel training-free hybrid retrieval framework leverages chunk granularity for query-specific reliability estimation
- The framework utilizes existing dense and sparse retrievers as complementary experts across various chunk granularities
- It transforms expert-granularity score lists into an evidence distribution, assesses reliability based on distribution entropy, and merges candidates considering query-specific factors
- An extension employs fine-grained hits to pinpoint relevant evidence while returning broader non-redundant parent chunks for enhanced local coherence during generation
- Experiments show improved generation quality with uncertainty-aware fusion and parent promotion techniques in long-document RAG settings involving multiple retrievers and generators
- The framework formalizes a tradeoff in retrieval granularity for long-document RAG scenarios and provides a solution that estimates query-specific reliability without extensive training
- Evaluation against existing benchmarks demonstrates efficacy in enhancing generation quality through uncertainty-aware fusion and parent promotion strategies

Summary- It's important to have good quality and detailed evidence when creating something using retrieved information. - Using big pieces of information can give a lot of context, but it might also include things that are not needed. - Smaller pieces of information are shorter, but they can be difficult to find reliably. - A new way of finding information combines different sizes of chunks to estimate how reliable the information is for a specific question. - This method uses both dense and sparse retrievers to help with different levels of detail in the information. Definitions- Quality: How good something is or how well it is done. - Granularity: The level of detail or size of something. - Retrieval: Finding and getting back information that was stored somewhere. - Framework: A structure or plan used to help organize and solve problems. - Reliability: How trustworthy or accurate something is.

In recent years, there has been a growing interest in retrieval augmented generation (RAG) - a framework that combines the strengths of both retrieval and generation models to improve performance on natural language processing tasks. However, one key challenge in RAG is determining the optimal granularity of retrieved evidence. On one hand, large retrieval units provide contextual richness but may also introduce irrelevant content that can hinder effective utilization of long contexts. On the other hand, fine-grained units are more concise but may pose challenges in reliable retrieval due to potential lack of semantic, lexical, or bridging cues necessary for query matching. To address this issue, a team of researchers from Carnegie Mellon University and Microsoft Research have introduced Chunk-based Uncertainty-Aware Retrieval (CUR), a novel training-free hybrid retrieval framework that leverages chunk granularity as a query-specific reliability estimation. The goal of CUR is to effectively merge evidence from multiple retrievers with varying granularities while minimizing the impact of irrelevant content. The CUR framework utilizes existing dense and sparse retrievers as complementary experts across various chunk granularities. For each query, it transforms expert-granularity score lists into an evidence distribution and assesses reliability based on distribution entropy. This allows CUR to identify which chunks contain relevant information and which ones are likely to be noise or redundant content. One unique aspect of CUR is its ability to adaptively adjust the level of granularity based on the specific needs of each query. Instead of developing new retrievers or modifying generators, CUR uses existing components in a lightweight and adaptable manner. Furthermore, the researchers have also proposed an extension called Fine-Grained Parent Promotion (FGPP), which employs fine-grained hits to pinpoint relevant evidence while returning broader non-redundant parent chunks for enhanced local coherence during generation. This approach aims to strike a balance between providing enough context for accurate generation while avoiding overwhelming amounts of irrelevant information. To evaluate their methods' effectiveness, the researchers conducted experiments on question answering benchmarks within long-document RAG settings involving multiple dense retrievers and generators. The results showed that CUR and FGPP significantly improved generation quality while maintaining a lightweight retrieval pipeline. In addition to their contributions in developing an uncertainty-aware fusion approach and parent promotion technique, the researchers also formalized the tradeoff between retrieval granularity and performance in long-document RAG scenarios. They proposed CUR as a solution that estimates query-specific reliability for each expert-granularity pair without the need for extensive training. This research paper also discusses related work focusing on interventions for addressing "lost in the middle" issues in language models within long prompts. It highlights the significance of considering retrieval granularity in hybrid approaches within RAG frameworks, emphasizing its potential impact on overall performance. Overall, this paper presents a novel framework that addresses one of the key challenges in RAG - determining optimal retrieval granularity. By leveraging existing components and incorporating uncertainty-aware fusion and parent promotion strategies, CUR demonstrates promising results in improving generation quality while maintaining a lightweight and adaptable retrieval pipeline. This research opens up new avenues for future studies exploring different ways to handle granularity considerations in hybrid approaches within RAG frameworks.

Created on 13 Jun. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

70.3%

Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs

cs.AI

64.8%

Enhancing Q&A with Domain-Specific Fine-Tuning and Iterative Reasoning: A Com…

cs.AI

64.6%

Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Re…

cs.AI

62.2%

Explainable Knowledge Graph Retrieval-Augmented Generation (KG-RAG) with KG-S…

cs.AI

62.1%

Efficient Knowledge Graph Construction and Retrieval from Unstructured Text f…

cs.AI

61.1%

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Fram…

cs.AI

59.2%

GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment …

cs.AI

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.