DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent

AI-generated keywords: Large Language Models LLM-based agents backdoor attacks Dynamically Encrypted Multi-Backdoor Implantation Attack AgentBackdoorEval dataset

AI-generated Key Points

  • Large Language Models (LLMs) have advanced capabilities and challenges
  • LLM-based agents demonstrate remarkable performance across diverse domains
  • Safety vulnerabilities, such as backdoor attacks, require urgent attention
  • Traditional backdoor attacks on individual LLMs are well-studied
  • Recent research explores embedding covert triggers for high attack success rates
  • Existing methods lack stealth and can be detected through safety audits
  • Dynamically Encrypted Multi-Backdoor Implantation Attack conceals backdoors effectively
  • Multi-Backdoor Tiered Implantation enhances stealth by fragmenting the backdoor
  • Experimental results show near-perfect attack success rate without detection alarms
  • AgentBackdoorEval dataset improves evaluation capabilities for agent backdoor attacks
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Pengyu Zhu, Zhenhong Zhou, Yuanhe Zhang, Shilinlu Yan, Kun Wang, Sen Su

License: CC BY 4.0

Abstract: As LLM-based agents become increasingly prevalent, backdoors can be implanted into agents through user queries or environment feedback, raising critical concerns regarding safety vulnerabilities. However, backdoor attacks are typically detectable by safety audits that analyze the reasoning process of agents. To this end, we propose a novel backdoor implantation strategy called \textbf{Dynamically Encrypted Multi-Backdoor Implantation Attack}. Specifically, we introduce dynamic encryption, which maps the backdoor into benign content, effectively circumventing safety audits. To enhance stealthiness, we further decompose the backdoor into multiple sub-backdoor fragments. Based on these advancements, backdoors are allowed to bypass safety audits significantly. Additionally, we present AgentBackdoorEval, a dataset designed for the comprehensive evaluation of agent backdoor attacks. Experimental results across multiple datasets demonstrate that our method achieves an attack success rate nearing 100\% while maintaining a detection rate of 0\%, illustrating its effectiveness in evading safety audits. Our findings highlight the limitations of existing safety mechanisms in detecting advanced attacks, underscoring the urgent need for more robust defenses against backdoor threats. Code and data are available at https://github.com/whfeLingYu/DemonAgent.

Submitted to arXiv on 18 Feb. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2502.12575v1

The widespread adoption of Large Language Models (LLMs) has ushered in a new era of advanced capabilities and unprecedented challenges. Equipped with the ability to comprehend complex tasks and access historical context, LLM-based agents have demonstrated remarkable performance across diverse domains. However, alongside their impressive functionality comes significant safety vulnerabilities that demand urgent attention. One such critical concern is the emergence of LLM backdoor attacks, where malicious actors implant hidden triggers within the agent's framework to induce harmful behavior under specific conditions. While traditional backdoor attacks targeting individual LLMs have been extensively studied, those directed at agent-based scenarios present unique challenges and opportunities for exploitation. Recent research has explored embedding covert triggers within user interactions or environmental feedback to achieve high attack success rates while maintaining normal task performance. Despite these advancements in backdoor attack strategies, existing methods often lack stealth and are susceptible to detection through safety audits and oversight mechanisms. In response to this limitation, a novel backdoor implantation strategy called Dynamically Encrypted Multi-Backdoor Implantation Attack has been proposed. This innovative approach leverages dynamic encryption to conceal the backdoor content within benign material, effectively evading detection during safety audits. Furthermore, Multi-Backdoor Tiered Implantation enhances stealth by fragmenting the backdoor into multiple sub-backdoors that are encrypted and implanted through tiered processes within the agent's workflow. Experimental results across multiple datasets demonstrate the effectiveness of this approach in evading safety audits while achieving a near-perfect attack success rate without triggering any detection alarms. The introduction of AgentBackdoorEval dataset further enhances comprehensive evaluation capabilities for agent backdoor attacks, highlighting the need for robust defenses against sophisticated threats posed by malicious actors targeting LLM-based agents. In conclusion, the Dynamically Encrypted Multi-Backdoor Implantation Attack represents a significant advancement in mitigating backdoor vulnerabilities in LLM-based agents. By emphasizing stealth and evasion of safety audits, this method showcases superior performance compared to existing techniques and underscores the importance of responsible development practices for ensuring trustworthy large language models in an increasingly complex digital landscape.
Created on 08 Aug. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.