From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
AI-generated Key Points
- Challenges persist in effectively harnessing massively parallel engines in CUDA programming despite advancements in programming and domain-specific libraries.
- Large language models (LLMs) show promise in generating optimized CUDA code, but face hurdles such as privacy risks with cloud-based APIs and high computational costs with local deployment.
- Small language models (SLMs) offer a more lightweight and privacy-friendly alternative to LLMs, achieving comparable performance on specific tasks but limited reasoning abilities for complex CUDA generation.
- ReGraphT is a novel training-free, retrieval-augmented generation framework that enhances the reasoning capabilities of SLMs by transferring LLM-level reasoning through structured reasoning graphs and Monte Carlo Graph Search (MCGS).
- Experimental results demonstrate that ReGraphT outperforms HPC-specific fine-tuned models and other retrieval-augmented approaches, achieving an average 2.33X speedup on CUDAEval and ParEval tasks.
- Combining ReGraphT with specific instructive models enables SLMs to approach LLM-level performance without compromising privacy or requiring excessive computing resources, offering a promising solution for optimizing CUDA code generation efficiently while maintaining privacy standards.
Authors: Junfeng Gong, Zhiyi Wei, Junying Chen, Cheng Liu, Huawei Li
Abstract: Despite significant evolution of CUDA programming and domain-specific libraries, effectively utilizing GPUs with massively parallel engines remains difficult. Large language models (LLMs) show strong potential in generating optimized CUDA code from sequential code. However, using LLMs in practice faces two major challenges: cloud-based APIs pose risks of code leakage, and local deployment is often computationally expensive and inefficient. These drawbacks have spurred interest in small language models (SLMs), which are more lightweight and privacy-friendly. Encouragingly, recent studies show that SLMs can achieve performance comparable to LLMs on specific tasks. While SLMs can match LLMs on domain-specific tasks, their limited reasoning abilities lead to suboptimal performance in complex CUDA generation according to our experiments. To bridge this gap, we propose ReGraphT, a training-free, retrieval-augmented generation framework that transfers LLM-level reasoning to smaller models. ReGraphT organizes CUDA optimization trajectories into a structured reasoning graph, modeling the combined CUDA optimizations as state transitions, and leverages Monte Carlo Graph Search (MCGS) for efficient exploration. We also present a CUDA-specific benchmark with difficulty tiers defined by reasoning complexity to evaluate models more comprehensively. Experiments show that ReGraphT outperforms HPC-specific fine-tuned models and other retrieval-augmented approaches, achieving an average 2.33X speedup on CUDAEval and ParEval. When paired with DeepSeek-Coder-V2-Lite-Instruct and Qwen2.5-Coder-7B-Instruct, ReGraphT enables SLMs to approach LLM-level performance without the associated privacy risks or excessive computing overhead.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.