In the realm of modern software development, GPUs are playing an increasingly crucial role. However, the complex host-device execution model and expanding software stacks have made GPU programs susceptible to memory-safety and concurrency bugs that are challenging to detect through static analysis alone. While fuzz-testing, coupled with dynamic error checking tools, presents a promising solution, its application in the realm of GPUs remains largely untapped. This gap in utilization can be attributed to three main obstacles encountered in prior GPU fuzzing efforts: (1) kernel-level fuzzing leading to false positives, (2) the absence of device-side coverage-guided feedback, and (3) compatibility issues between coverage and sanitization tools. To address these challenges effectively, a groundbreaking CUDA-oriented fuzzer called cuFuzz has been introduced. By employing whole program fuzzing instead of independently fuzzing device-side kernels, cuFuzz successfully avoids false positives. Leveraging NVBit for instrumenting device-side instructions enables cuFuzz to merge resulting coverage with compiler-based host coverage seamlessly. Furthermore, cuFuzz separates sanitization from coverage collection by executing host- and device-side sanitizers in separate processes. The efficacy of cuFuzz is evidenced by its discovery of 43 previously unknown bugs (including 19 in commercial libraries) across 14 CUDA programs. These bugs encompass illegal memory accesses, uninitialized reads, and data races. Notably, cuFuzz outperforms baseline approaches by uncovering significantly more discovered edges and unique inputs – particularly on closed-source targets. The execution time overheads of various cuFuzz components have been quantified, with persistent-mode support added to enhance overall fuzzing throughput. The results underscore that cuFuzz represents a valuable addition to the GPU testing toolbox due to its effectiveness and deployability. The artifact is publicly available on Zenodo [44], comprising source code, usage instructions, and evaluation scripts for replicating key experiments outlined in this paper. Acknowledgments are extended to reviewers for their insightful feedback and individuals who contributed towards addressing HeCBench bug reports uncovered by cuFuzz as well as handling reported bugs within NVIDIA's CUDA-accelerated libraries. Valuable technical discussions were also facilitated by Aamer Jaleel, Mark Stephenson, Sana Damani, and members of the Architecture Research Group at NVIDIA Research.
- - GPUs are increasingly important in modern software development
- - GPU programs face challenges such as memory-safety and concurrency bugs
- - Fuzz-testing combined with dynamic error checking tools is a promising solution for detecting bugs in GPU programs
- - Prior GPU fuzzing efforts have encountered obstacles like kernel-level fuzzing, lack of device-side coverage feedback, and compatibility issues between tools
- - cuFuzz is a CUDA-oriented fuzzer that addresses these challenges effectively
- - cuFuzz discovered 43 previously unknown bugs across 14 CUDA programs, including illegal memory accesses, uninitialized reads, and data races
- - cuFuzz outperforms baseline approaches by uncovering more edges and unique inputs, especially on closed-source targets
- - The artifact for cuFuzz is publicly available on Zenodo with source code, usage instructions, and evaluation scripts
- - Acknowledgments are extended to reviewers and contributors who helped address bug reports uncovered by cuFuzz
Summary- Graphics processing units (GPUs) are important in making new computer programs.
- Programs for GPUs can have problems like memory issues and bugs that happen when things are done at the same time.
- Testing tools combined with error checkers can help find these problems in GPU programs.
- A special tool called cuFuzz helps find many bugs in programs made for CUDA, a type of GPU programming language.
- cuFuzz found 43 new bugs in 14 CUDA programs, like mistakes with memory and data.
Definitions- GPUs: Graphics Processing Units - special computer parts that help make images and run programs faster.
- Bugs: Mistakes or problems in computer programs that need to be fixed.
- Fuzz-testing: Trying different inputs to see if there are any unexpected results or errors.
- Concurrency: Doing multiple things at the same time in a program.
- Kernel-level fuzzing: Testing at a deep level within the operating system of a computer.
In recent years, GPUs have become an integral part of modern software development. Their ability to handle complex calculations and process large amounts of data has made them a crucial component in various industries such as gaming, artificial intelligence, and scientific research. However, with the increasing complexity of GPU programs and expanding software stacks, they have also become susceptible to memory-safety and concurrency bugs that are difficult to detect through traditional static analysis methods.
To address these challenges, a team of researchers from NVIDIA Research has introduced a groundbreaking CUDA-oriented fuzzer called cuFuzz. This tool aims to improve the effectiveness and deployability of GPU testing by addressing three main obstacles encountered in prior GPU fuzzing efforts: kernel-level fuzzing leading to false positives, the absence of device-side coverage-guided feedback, and compatibility issues between coverage and sanitization tools.
The first challenge addressed by cuFuzz is the issue of false positives caused by kernel-level fuzzing. Previous approaches focused on independently fuzzing device-side kernels which often resulted in a high number of false positives due to incomplete code coverage. To overcome this limitation, cuFuzz employs whole program fuzzing where both host- and device-side code are tested together. This approach significantly reduces false positives and improves overall bug detection accuracy.
Another key feature of cuFuzz is its ability to merge resulting coverage from both host- and device-side code seamlessly. This is made possible by leveraging NVBit for instrumenting device-side instructions. By combining compiler-based host coverage with device-side coverage guided feedback, cuFuzz provides comprehensive code coverage that helps identify potential bugs more efficiently.
One major hurdle faced by previous GPU fuzzers was the lack of compatibility between coverage-guided feedback tools and sanitization tools. CuFuzz addresses this issue by separating sanitization from coverage collection through separate processes for executing host- and device-side sanitizers. This allows for better coordination between different components without compromising on performance or effectiveness.
To evaluate the effectiveness of cuFuzz, the researchers conducted experiments on 14 CUDA programs and discovered 43 previously unknown bugs, including 19 in commercial libraries. These bugs ranged from illegal memory accesses to data races, highlighting the importance of thorough testing for GPU programs. CuFuzz also outperformed baseline approaches by uncovering significantly more unique inputs and discovered edges, particularly on closed-source targets.
In addition to its effectiveness, cuFuzz also offers improved deployability with a publicly available artifact on Zenodo. This includes source code, usage instructions, and evaluation scripts for replicating key experiments outlined in the research paper. The team also acknowledges valuable technical discussions with industry experts and individuals who contributed towards addressing bug reports uncovered by cuFuzz.
Overall, cuFuzz represents a valuable addition to the GPU testing toolbox due to its ability to address key challenges faced by previous fuzzers. Its whole program fuzzing approach coupled with comprehensive coverage-guided feedback makes it an effective tool for detecting memory-safety and concurrency bugs in complex GPU programs. With its publicly available artifact and promising results in bug detection, cuFuzz has the potential to greatly improve the reliability of software utilizing GPUs.