DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation
AI-generated Key Points
- Diffusion large language models (dLLMs) are explored as alternatives to autoregressive models for code generation
- dLLMs offer global planning and iterative refinement features that are beneficial in coding tasks
- Authors conduct a systematic investigation into denoising processes and reinforcement learning methods of dLLMs in coding
- A 7B dLLM named "DiffuCoder" is trained on a massive dataset of 130B tokens of code for analysis
- Key differences between dLLMs and autoregressive models are uncovered, including the ability to determine causality without semi-autoregressive decoding
- Increasing sampling temperature diversifies token choices and alters generation order, creating a rich search space for reinforcement learning rollouts
- A novel sampling scheme called "coupled-GRPO" is proposed for reinforcement learning training to improve performance on code generation benchmarks
- DiffuCoder's performance achieves a +4.4% improvement on EvalPlus while reducing reliance on autoregressive causality during decoding
- Deeper insights into dLLM generation mechanics are provided, along with an effective diffusion-native RL training framework
- Practical considerations such as faster generation speeds with diffusion models compared to autoregressive ones are discussed
Authors: Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, Yizhe Zhang
Abstract: Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of dLLMs and unlock their potential for coding, we systematically investigate their denoising processes and reinforcement learning (RL) methods. We train a 7B dLLM, \textbf{DiffuCoder}, on 130B tokens of code. Using this model as a testbed, we analyze its decoding behavior, revealing how it differs from that of AR models: (1) dLLMs can decide how causal their generation should be without relying on semi-AR decoding, and (2) increasing the sampling temperature diversifies not only token choices but also their generation order. This diversity creates a rich search space for RL rollouts. For RL training, to reduce the variance of token log-likelihood estimates and maintain training efficiency, we propose \textbf{coupled-GRPO}, a novel sampling scheme that constructs complementary mask noise for completions used in training. In our experiments, coupled-GRPO significantly improves DiffuCoder's performance on code generation benchmarks (+4.4\% on EvalPlus) and reduces reliance on AR causal during decoding. Our work provides deeper insight into the machinery of dLLM generation and offers an effective, diffusion-native RL training framework. https://github.com/apple/ml-diffucoder.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.