ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

AI-generated keywords: Cybersecurity

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • AI agents in cybersecurity present challenges and opportunities, particularly in the area of exploitation.
  • Exploitation involves turning vulnerabilities into security threats like unauthorized access or code execution.
  • Exploitation requires deep program reasoning, runtime adaptation, and sustained progress over time.
  • Exploitation has dual-use implications for defensive and offensive purposes.
  • The introduction of ExploitGym provides a benchmark to assess AI agents' exploitation capabilities with 898 real-world vulnerability instances.
  • Top-performing AI models like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have shown success in exploiting vulnerabilities even with common defenses enabled.
  • ExploitGym serves as a valuable testbed for evaluating exploitation capabilities and highlights the increasing cybersecurity risks posed by advanced AI agents.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhun Wang, Nico Schiller, Hongwei Li, Srijiith Sesha Narayana, Milad Nasr, Nicholas Carlini, Xiangyu Qi, Eric Wallace, Elie Bursztein, Luca Invernizzi, Kurt Thomas, Yan Shoshitaishvili, Wenbo Guo, Jingxuan He, Thorsten Holz, Dawn Song

Abstract: AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete security impact, such as unauthorized file access or code execution. Exploitation is a particularly challenging task because it requires low-level program reasoning (e.g., about memory layout), runtime adaptation, and sustained progress over long horizons. Meanwhile, it is inherently dual-use, supporting defensive workflows while lowering the barrier for offense. Despite its importance and diagnostic value, exploitation remains under-evaluated. To address this gap, we introduce ExploitGym, a large-scale, diverse, realistic benchmark on the exploitation capabilities of AI agents. Given a program input that triggers a vulnerability, ExploitGym tasks agents with progressively extending it into a working exploit. The benchmark comprises 898 instances sourced from real-world vulnerabilities across three domains, including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. We vary the security protections applied to each instance, isolating their impact on agent performance. All configurations are packaged in reproducible containerized environments. Our evaluation shows that while exploitation remains challenging, frontier models can successfully exploit a non-trivial fraction of vulnerabilities. For example, the strongest configurations are Anthropic's latest model Claude Mythos Preview and OpenAI's GPT-5.5, which produce working exploits for 157 and 120 instances, respectively. Notably, even with widely used defenses enabled, models retain non-trivial success rates. These results establish ExploitGym as an effective testbed for exploitation and highlight the growing cybersecurity risks posed by increasingly capable AI agents.

Submitted to arXiv on 11 May. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2605.11086v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In the rapidly evolving landscape of cybersecurity, the advancement of AI agents is posing significant challenges and opportunities. One critical capability that is gaining prominence is exploitation - the process of turning a vulnerability into a concrete security threat. This can include unauthorized access to files or code execution. However, this task is complex and requires deep program reasoning, runtime adaptation, and sustained progress over extended periods. Furthermore, exploitation has dual-use implications, serving both defensive purposes and potentially lowering barriers for offensive actions. Despite its crucial importance and diagnostic value, it has been under-evaluated in existing frameworks. To bridge this gap, a team of researchers introduced ExploitGym - a comprehensive benchmark designed to assess the exploitation capabilities of AI agents. The benchmark consists of 898 instances sourced from real-world vulnerabilities spanning various domains including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. Each instance is equipped with varying security protections to evaluate their impact on agent performance. Through rigorous evaluation using ExploitGym, it was observed that while exploitation remains a challenging task, cutting-edge AI models demonstrated success in exploiting a notable fraction of vulnerabilities. For instance, top-performing configurations like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 were able to produce working exploits for 157 and 120 instances respectively. Even when common defenses were enabled, these models exhibited non-trivial success rates. These findings underscore the effectiveness of ExploitGym as a valuable testbed for evaluating exploitation capabilities and shed light on the escalating cybersecurity risks posed by increasingly sophisticated AI agents. With the continuous evolution of technology and cyber threats, initiatives like ExploitGym play a crucial role in enhancing our understanding of AI-driven cybersecurity challenges and fortifying defense mechanisms against potential attacks.
Created on 16 May. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.