ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

AI-generated keywords: Cybersecurity

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

AI agents in cybersecurity present challenges and opportunities, particularly in the area of exploitation.
Exploitation involves turning vulnerabilities into security threats like unauthorized access or code execution.
Exploitation requires deep program reasoning, runtime adaptation, and sustained progress over time.
Exploitation has dual-use implications for defensive and offensive purposes.
The introduction of ExploitGym provides a benchmark to assess AI agents' exploitation capabilities with 898 real-world vulnerability instances.
Top-performing AI models like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have shown success in exploiting vulnerabilities even with common defenses enabled.
ExploitGym serves as a valuable testbed for evaluating exploitation capabilities and highlights the increasing cybersecurity risks posed by advanced AI agents.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zhun Wang, Nico Schiller, Hongwei Li, Srijiith Sesha Narayana, Milad Nasr, Nicholas Carlini, Xiangyu Qi, Eric Wallace, Elie Bursztein, Luca Invernizzi, Kurt Thomas, Yan Shoshitaishvili, Wenbo Guo, Jingxuan He, Thorsten Holz, Dawn Song

arXiv: 2605.11086v1 - DOI (cs.CR)

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete security impact, such as unauthorized file access or code execution. Exploitation is a particularly challenging task because it requires low-level program reasoning (e.g., about memory layout), runtime adaptation, and sustained progress over long horizons. Meanwhile, it is inherently dual-use, supporting defensive workflows while lowering the barrier for offense. Despite its importance and diagnostic value, exploitation remains under-evaluated. To address this gap, we introduce ExploitGym, a large-scale, diverse, realistic benchmark on the exploitation capabilities of AI agents. Given a program input that triggers a vulnerability, ExploitGym tasks agents with progressively extending it into a working exploit. The benchmark comprises 898 instances sourced from real-world vulnerabilities across three domains, including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. We vary the security protections applied to each instance, isolating their impact on agent performance. All configurations are packaged in reproducible containerized environments. Our evaluation shows that while exploitation remains challenging, frontier models can successfully exploit a non-trivial fraction of vulnerabilities. For example, the strongest configurations are Anthropic's latest model Claude Mythos Preview and OpenAI's GPT-5.5, which produce working exploits for 157 and 120 instances, respectively. Notably, even with widely used defenses enabled, models retain non-trivial success rates. These results establish ExploitGym as an effective testbed for exploitation and highlight the growing cybersecurity risks posed by increasingly capable AI agents.

Submitted to arXiv on 11 May. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2605.11086v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , In the rapidly evolving landscape of cybersecurity, the advancement of AI agents is posing significant challenges and opportunities. One critical capability that is gaining prominence is exploitation - the process of turning a vulnerability into a concrete security threat. This can include unauthorized access to files or code execution. However, this task is complex and requires deep program reasoning, runtime adaptation, and sustained progress over extended periods. Furthermore, exploitation has dual-use implications, serving both defensive purposes and potentially lowering barriers for offensive actions. Despite its crucial importance and diagnostic value, it has been under-evaluated in existing frameworks. To bridge this gap, a team of researchers introduced ExploitGym - a comprehensive benchmark designed to assess the exploitation capabilities of AI agents. The benchmark consists of 898 instances sourced from real-world vulnerabilities spanning various domains including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. Each instance is equipped with varying security protections to evaluate their impact on agent performance. Through rigorous evaluation using ExploitGym, it was observed that while exploitation remains a challenging task, cutting-edge AI models demonstrated success in exploiting a notable fraction of vulnerabilities. For instance, top-performing configurations like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 were able to produce working exploits for 157 and 120 instances respectively. Even when common defenses were enabled, these models exhibited non-trivial success rates. These findings underscore the effectiveness of ExploitGym as a valuable testbed for evaluating exploitation capabilities and shed light on the escalating cybersecurity risks posed by increasingly sophisticated AI agents. With the continuous evolution of technology and cyber threats, initiatives like ExploitGym play a crucial role in enhancing our understanding of AI-driven cybersecurity challenges and fortifying defense mechanisms against potential attacks.

- AI agents in cybersecurity present challenges and opportunities, particularly in the area of exploitation.
- Exploitation involves turning vulnerabilities into security threats like unauthorized access or code execution.
- Exploitation requires deep program reasoning, runtime adaptation, and sustained progress over time.
- Exploitation has dual-use implications for defensive and offensive purposes.
- The introduction of ExploitGym provides a benchmark to assess AI agents' exploitation capabilities with 898 real-world vulnerability instances.
- Top-performing AI models like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have shown success in exploiting vulnerabilities even with common defenses enabled.
- ExploitGym serves as a valuable testbed for evaluating exploitation capabilities and highlights the increasing cybersecurity risks posed by advanced AI agents.

Summary1. AI agents in cybersecurity can be both challenging and offer opportunities, especially in exploiting vulnerabilities. 2. Exploitation means using weaknesses to create security threats like unauthorized access or running harmful code. 3. To exploit effectively, one needs to deeply understand programs, adapt during operation, and make progress consistently. 4. Exploitation can be used for both defensive (protective) and offensive (harmful) purposes. 5. ExploitGym is a tool that helps test how well AI agents can exploit vulnerabilities in real-world situations. Definitions- AI agents: Computer programs that use artificial intelligence to perform tasks without direct human intervention. - Exploitation: Taking advantage of weaknesses or vulnerabilities to cause harm or gain unauthorized access. - Vulnerabilities: Weaknesses in software or systems that can be exploited by attackers. - Benchmark: A standard or point of reference used for comparison or evaluation. - Cybersecurity: Measures taken to protect computer systems and networks from attacks or unauthorized access.

Introduction

In the ever-changing world of cybersecurity, artificial intelligence (AI) agents are becoming increasingly advanced and posing both challenges and opportunities. One critical capability that is gaining prominence is exploitation - the process of turning a vulnerability into a concrete security threat. This can include unauthorized access to files or code execution. However, this task is complex and requires deep program reasoning, runtime adaptation, and sustained progress over extended periods. Despite its crucial importance and diagnostic value, exploitation has been under-evaluated in existing frameworks. To bridge this gap, a team of researchers introduced ExploitGym - a comprehensive benchmark designed to assess the exploitation capabilities of AI agents.

The ExploitGym Benchmark

The ExploitGym benchmark consists of 898 instances sourced from real-world vulnerabilities spanning various domains including userspace programs, Google's V8 JavaScript engine, and the Linux kernel. Each instance is equipped with varying security protections to evaluate their impact on agent performance. Through rigorous evaluation using ExploitGym, it was observed that while exploitation remains a challenging task, cutting-edge AI models demonstrated success in exploiting a notable fraction of vulnerabilities. For instance, top-performing configurations like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 were able to produce working exploits for 157 and 120 instances respectively. Even when common defenses were enabled, these models exhibited non-trivial success rates. These findings underscore the effectiveness of ExploitGym as a valuable testbed for evaluating exploitation capabilities.

Implications for Cybersecurity

The results obtained through ExploitGym shed light on the escalating cybersecurity risks posed by increasingly sophisticated AI agents. With their ability to adapt quickly and reason deeply about programs, these agents have shown potential in exploiting vulnerabilities that may have previously gone undetected by traditional methods. Furthermore, since exploitation has dual-use implications, it can serve both defensive purposes and potentially lower barriers for offensive actions. This highlights the need for a comprehensive understanding of AI-driven cybersecurity challenges and the development of robust defense mechanisms.

Future Directions

The ExploitGym benchmark provides a valuable foundation for future research in this area. It not only allows for the evaluation of current AI models but also serves as a platform for the development and testing of new techniques to improve exploitation capabilities. Additionally, expanding the benchmark to include more diverse instances from different domains could further enhance its effectiveness in evaluating agent performance. This would also provide insights into how well AI agents can generalize their exploitation abilities across different types of vulnerabilities.

Conclusion

In conclusion, ExploitGym is an essential contribution to the field of cybersecurity as it addresses the under-evaluation of exploitation capabilities in existing frameworks. Through rigorous evaluation using this benchmark, researchers have gained valuable insights into the success rates and potential risks posed by cutting-edge AI models in exploiting vulnerabilities. As technology continues to evolve at a rapid pace, initiatives like ExploitGym play a crucial role in enhancing our understanding of AI-driven cybersecurity challenges and fortifying defense mechanisms against potential attacks. With further developments and expansions, this benchmark has great potential to aid in securing our digital world from emerging threats.

Created on 16 May. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

66.4%

A LLM Assisted Exploitation of AI-Guardian

cs.CR

63.7%

On the Exploitability of Instruction Tuning

cs.CR

61.9%

Adversarial Attacks and Defenses in Explainable Artificial Intelligence: A Su…

cs.CR

60.8%

Autonomous Penetration Testing using Reinforcement Learning

cs.CR

60.0%

Chatbots to ChatGPT in a Cybersecurity Space: Evolution, Vulnerabilities, Att…

cs.CR

59.7%

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

cs.CR

59.2%

Learning to Evade Static PE Machine Learning Malware Models via Reinforcement…

cs.CR

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.