Hunting CUDA Bugs at Scale with cuFuzz

AI-generated keywords: GPUs software development memory-safety concurrency bugs cuFuzz

AI-generated Key Points

  • GPUs are increasingly important in modern software development
  • GPU programs face challenges such as memory-safety and concurrency bugs
  • Fuzz-testing combined with dynamic error checking tools is a promising solution for detecting bugs in GPU programs
  • Prior GPU fuzzing efforts have encountered obstacles like kernel-level fuzzing, lack of device-side coverage feedback, and compatibility issues between tools
  • cuFuzz is a CUDA-oriented fuzzer that addresses these challenges effectively
  • cuFuzz discovered 43 previously unknown bugs across 14 CUDA programs, including illegal memory accesses, uninitialized reads, and data races
  • cuFuzz outperforms baseline approaches by uncovering more edges and unique inputs, especially on closed-source targets
  • The artifact for cuFuzz is publicly available on Zenodo with source code, usage instructions, and evaluation scripts
  • Acknowledgments are extended to reviewers and contributors who helped address bug reports uncovered by cuFuzz
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Mohamed Tarek Ibn ziad, Christos Kozyrakis

Accepted for publication at the International Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2026)
License: CC BY 4.0

Abstract: GPUs play an increasingly important role in modern software. However, the heterogeneous host-device execution model and expanding software stacks make GPU programs prone to memory-safety and concurrency bugs that evade static analysis. While fuzz-testing, combined with dynamic error checking tools, offers a plausible solution, it remains underutilized for GPUs. In this work, we identify three main obstacles limiting prior GPU fuzzing efforts: (1) kernel-level fuzzing leading to false positives, (2) lack of device-side coverage-guided feedback, and (3) incompatibility between coverage and sanitization tools. We present cuFuzz, the first CUDA-oriented fuzzer that makes GPU fuzzing practical by addressing these obstacles. cuFuzz uses whole program fuzzing to avoid false positives from independently fuzzing device-side kernels. It leverages NVBit to instrument device-side instructions and merges the resultant coverage with compiler-based host coverage. Finally, cuFuzz decouples sanitization from coverage collection by executing host- and device-side sanitizers in separate processes. cuFuzz uncovers 43 previously unknown bugs (19 in commercial libraries) across 14 CUDA programs, including illegal memory accesses, uninitialized reads, and data races. cuFuzz achieves significantly more discovered edges and unique inputs compared to baseline approaches, especially on closed-source targets. Moreover, we quantify the execution time overheads of the different cuFuzz components and add persistent-mode support to improve the overall fuzzing throughput. Our results demonstrate that cuFuzz is an effective and deployable addition to the GPU testing toolbox. cuFuzz is publicly available at https://github.com/NVlabs/cuFuzz/.

Submitted to arXiv on 12 Mar. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2603.12485v1

In the realm of modern software development, GPUs are playing an increasingly crucial role. However, the complex host-device execution model and expanding software stacks have made GPU programs susceptible to memory-safety and concurrency bugs that are challenging to detect through static analysis alone. While fuzz-testing, coupled with dynamic error checking tools, presents a promising solution, its application in the realm of GPUs remains largely untapped. This gap in utilization can be attributed to three main obstacles encountered in prior GPU fuzzing efforts: (1) kernel-level fuzzing leading to false positives, (2) the absence of device-side coverage-guided feedback, and (3) compatibility issues between coverage and sanitization tools. To address these challenges effectively, a groundbreaking CUDA-oriented fuzzer called cuFuzz has been introduced. By employing whole program fuzzing instead of independently fuzzing device-side kernels, cuFuzz successfully avoids false positives. Leveraging NVBit for instrumenting device-side instructions enables cuFuzz to merge resulting coverage with compiler-based host coverage seamlessly. Furthermore, cuFuzz separates sanitization from coverage collection by executing host- and device-side sanitizers in separate processes. The efficacy of cuFuzz is evidenced by its discovery of 43 previously unknown bugs (including 19 in commercial libraries) across 14 CUDA programs. These bugs encompass illegal memory accesses, uninitialized reads, and data races. Notably, cuFuzz outperforms baseline approaches by uncovering significantly more discovered edges and unique inputs – particularly on closed-source targets. The execution time overheads of various cuFuzz components have been quantified, with persistent-mode support added to enhance overall fuzzing throughput. The results underscore that cuFuzz represents a valuable addition to the GPU testing toolbox due to its effectiveness and deployability. The artifact is publicly available on Zenodo [44], comprising source code, usage instructions, and evaluation scripts for replicating key experiments outlined in this paper. Acknowledgments are extended to reviewers for their insightful feedback and individuals who contributed towards addressing HeCBench bug reports uncovered by cuFuzz as well as handling reported bugs within NVIDIA's CUDA-accelerated libraries. Valuable technical discussions were also facilitated by Aamer Jaleel, Mark Stephenson, Sana Damani, and members of the Architecture Research Group at NVIDIA Research.
Created on 30 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.