Inference Scaled GraphRAG: Improving Multi Hop Question Answering on Knowledge Graphs

AI-generated keywords: Large Language Models Retrieval-Augmented Generation GraphRAG Inference-Time Scaling Knowledge Graphs

AI-generated Key Points

  • Large Language Models (LLMs) excel in language understanding and generation but struggle with knowledge-intensive reasoning tasks requiring structured context and multi-hop information.
  • Retrieval-Augmented Generation (RAG) addresses this limitation by integrating retrieved context into the generation process.
  • Traditional RAG and GraphRAG methods have limitations in capturing relational structures across nodes in knowledge graphs.
  • Inference-Scaled GraphRAG enhances LLM-based graph reasoning by applying inference-time compute scaling, combining sequential and parallel scaling for deeper insights and improved robustness.
  • Experimental results on GRBench benchmark show significant improvement in multi-hop question answering performance compared to traditional methods, highlighting the effectiveness of inference-time scaling.
  • Knowledge graphs consist of nodes representing entities connected by edges denoting relations, providing a structured framework for reasoning.
  • RAG integrates external information retrieval to enhance reasoning over factual knowledge within the generation pipeline.
  • Inference-time scaling involves allocating additional compute resources at test time without changing model architecture, leading to enhanced performance on complex reasoning tasks.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Travis Thompson, Seung-Hwan Lim, Paul Liu, Ruoying He, Dongkuan Xu

License: CC BY 4.0

Abstract: Large Language Models (LLMs) have achieved impressive capabilities in language understanding and generation, yet they continue to underperform on knowledge-intensive reasoning tasks due to limited access to structured context and multi-hop information. Retrieval-Augmented Generation (RAG) partially mitigates this by grounding generation in retrieved context, but conventional RAG and GraphRAG methods often fail to capture relational structure across nodes in knowledge graphs. We introduce Inference-Scaled GraphRAG, a novel framework that enhances LLM-based graph reasoning by applying inference-time compute scaling. Our method combines sequential scaling with deep chain-of-thought graph traversal, and parallel scaling with majority voting over sampled trajectories within an interleaved reasoning-execution loop. Experiments on the GRBench benchmark demonstrate that our approach significantly improves multi-hop question answering performance, achieving substantial gains over both traditional GraphRAG and prior graph traversal baselines. These findings suggest that inference-time scaling is a practical and architecture-agnostic solution for structured knowledge reasoning with LLMs

Submitted to arXiv on 24 Jun. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2506.19967v1

Large Language Models (LLMs) have shown remarkable proficiency in language understanding and generation. However, they struggle with knowledge-intensive reasoning tasks that require access to structured context and multi-hop information. To address this limitation, Retrieval-Augmented Generation (RAG) has been introduced as a solution by incorporating retrieved context into the generation process. While traditional RAG and GraphRAG methods have made progress in this area, they often fall short in capturing relational structures across nodes in knowledge graphs. To overcome this challenge, we propose a novel framework called Inference-Scaled GraphRAG. This framework enhances LLM-based graph reasoning by applying inference-time compute scaling. Our method combines sequential scaling - where the model performs step-by-step reasoning based on previous outputs for deeper insights - with parallel scaling - where multiple responses are generated independently and aggregated using strategies like majority voting for improved robustness. In experiments conducted on the GRBench benchmark, our approach significantly improves multi-hop question answering performance compared to traditional GraphRAG and prior graph traversal baselines. These results demonstrate that inference-time scaling is a practical and architecture-agnostic solution for enhancing structured knowledge reasoning with LLMs. Additionally, our study provides background information on knowledge graphs, retrieval-augmented generation (RAG), and inference-time scaling. Knowledge graphs are defined as sets of nodes representing entities connected by edges denoting relations. RAG integrates external information retrieval into the generation pipeline to enhance reasoning over factual knowledge. Inference-time scaling involves allocating additional compute resources at test time without modifying the model architecture, enabling improved performance on complex reasoning tasks. In conclusion, by incorporating both sequential and parallel scaling into the GraphRAG framework, our proposed method enables large language models to efficiently conduct multi-hop reasoning over structured knowledge graphs in a more effective manner.
Created on 08 Jul. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.