PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

AI-generated keywords: Large language models

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Large language models (LLMs) have impressive generative capabilities but also limitations such as outdated knowledge and potential hallucination.
  • Retrieval-Augmented Generation (RAG) leverages external knowledge from databases to enhance answer generation.
  • PoisonedRAG is a new attack surface targeting LLMs by injecting malicious texts into the knowledge database to manipulate generated answers.
  • The authors frame knowledge corruption attacks as an optimization problem and present solutions tailored to different attacker backgrounds.
  • PoisonedRAG can achieve a high success rate of 90% by injecting just five malicious texts per target question into a large knowledge database.
  • Existing defense mechanisms against PoisonedRAG are found inadequate, highlighting the need for novel defense strategies in securing RAG systems.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia

To appear in USENIX Security Symposium 2025. The code is available at https://github.com/sleeepeer/PoisonedRAG

Abstract: Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. The key idea of RAG is to ground the answer generation of an LLM on external knowledge retrieved from a knowledge database. Existing studies mainly focus on improving the accuracy or efficiency of RAG, leaving its security largely unexplored. We aim to bridge the gap in this work. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG, where an attacker could inject a few malicious texts into the knowledge database of a RAG system to induce an LLM to generate an attacker-chosen target answer for an attacker-chosen target question. We formulate knowledge corruption attacks as an optimization problem, whose solution is a set of malicious texts. Depending on the background knowledge (e.g., black-box and white-box settings) of an attacker on a RAG system, we propose two solutions to solve the optimization problem, respectively. Our results show PoisonedRAG could achieve a 90% attack success rate when injecting five malicious texts for each target question into a knowledge database with millions of texts. We also evaluate several defenses and our results show they are insufficient to defend against PoisonedRAG, highlighting the need for new defenses.

Submitted to arXiv on 12 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.07867v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , Large language models (LLMs) have revolutionized natural language processing with their impressive generative capabilities, but they also come with limitations such as outdated knowledge and potential hallucination. To address these issues, Retrieval-Augmented Generation (RAG) has emerged as a cutting-edge technique that leverages external knowledge from databases to enhance answer generation. In this work by Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia, the authors delve into the security implications of RAG systems. They identify a new attack surface introduced by the knowledge database in RAG systems and propose PoisonedRAG as the first knowledge corruption attack targeting LLMs. In PoisonedRAG, attackers can inject malicious texts into the knowledge database to manipulate LLMs into generating specific answers for chosen questions. The authors frame knowledge corruption attacks as an optimization problem where the goal is to identify a set of malicious texts that can successfully induce targeted responses from LLMs. They consider different attacker backgrounds, including black-box and white-box settings, and present two solutions tailored to each scenario. Through their experiments, the researchers demonstrate that PoisonedRAG can achieve a high success rate of 90% by injecting just five malicious texts per target question into a large knowledge database. Furthermore, they evaluate existing defense mechanisms against PoisonedRAG and find them inadequate in thwarting such attacks, underscoring the urgent need for novel defense strategies in securing RAG systems. This study sheds light on the vulnerabilities inherent in RAG systems and underscores the importance of considering security implications alongside performance enhancements in developing advanced language models. The findings presented here pave the way for future research efforts aimed at fortifying RAG systems against malicious manipulation while preserving their innovative capabilities in natural language generation.
Created on 12 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.