SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation

AI-generated keywords: Large Language Models

AI-generated Key Points

  • Recent advancements in large language models (LLMs) have shown versatility across tasks
  • Retrieval-augmented generation (RAG) addresses hallucinations in LLMs by using external knowledge sources like knowledge graphs (KGs)
  • Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG) aligns query texts with KG structures through a two-stage process
  • Emphasis on plug-and-play usability and scalability in SimGRAG
  • GSD metric and optimized subgraph retrieval algorithm enhance efficiency and scalability
  • SimGRAG outperforms existing methods in question answering and fact verification tasks
  • Performance of SimGRAG depends on the capabilities of underlying LLMs, which may affect performance for complex queries
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuzheng Cai, Zhenyue Guo, Yiwen Pei, Wanrui Bian, Weiguo Zheng

License: CC BY-NC-SA 4.0

Abstract: Recent advancements in large language models (LLMs) have shown impressive versatility across various tasks. To eliminate its hallucinations, retrieval-augmented generation (RAG) has emerged as a powerful approach, leveraging external knowledge sources like knowledge graphs (KGs). In this paper, we study the task of KG-driven RAG and propose a novel Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG) method. It effectively addresses the challenge of aligning query texts and KG structures through a two-stage process: (1) query-to-pattern, which uses an LLM to transform queries into a desired graph pattern, and (2) pattern-to-subgraph, which quantifies the alignment between the pattern and candidate subgraphs using a graph semantic distance (GSD) metric. We also develop an optimized retrieval algorithm that efficiently identifies the top-$k$ subgraphs within 1-second latency on a 10-million-scale KG. Extensive experiments show that SimGRAG outperforms state-of-the-art KG-driven RAG methods in both question answering and fact verification, offering superior plug-and-play usability and scalability.

Submitted to arXiv on 17 Dec. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2412.15272v1

, , , , Recent advancements in large language models (LLMs) have demonstrated remarkable versatility across various tasks. To address the issue of hallucinations in LLMs, retrieval-augmented generation (RAG) has emerged as a powerful approach, utilizing external knowledge sources like knowledge graphs (KGs). In this study, we delve into the realm of KG-driven RAG and introduce a novel method known as Similar Graph Enhanced Retrieval-Augmented Generation (SimGRAG). The core focus of SimGRAG lies in effectively aligning query texts with KG structures through a two-stage process. Firstly, the query-to-pattern alignment involves using an LLM to transform queries into a desired graph pattern. Secondly, the pattern-to-subgraph alignment quantifies the alignment between the pattern and candidate subgraphs using a graph semantic distance (GSD) metric. Additionally, an optimized retrieval algorithm has been developed to efficiently identify the top-k subgraphs within 1-second latency on a 10-million-scale KG. One key aspect of SimGRAG is its emphasis on plug-and-play usability and scalability. By leveraging the query-to-pattern and pattern-to-subgraph alignment paradigm, SimGRAG ensures context conciseness and prevents entity leaks. The introduction of the GSD metric and the optimized subgraph retrieval algorithm further enhance retrieval efficiency and scalability. Extensive experiments conducted across different KG-driven RAG tasks have consistently shown that SimGRAG outperforms existing methods in question answering and fact verification. The method's ability to align query texts with KG structures effectively contributes to its superior performance compared to state-of-the-art approaches. However, it is important to note that the performance of SimGRAG heavily relies on the underlying capabilities of LLMs. Lower-quality or less capable LLMs may lead to degraded performance, especially in scenarios requiring advanced reasoning skills for complex queries. In conclusion, our study introduces SimGRAG as an innovative approach for KG-driven RAG that offers improved usability, context conciseness, and scalability. By combining large language models with knowledge graphs effectively, SimGRAG sets a new standard for addressing challenges in information retrieval tasks.
Created on 20 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.