From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

AI-generated keywords: Large language models AI-assisted coding Multi-Granularity Debugger hierarchical code debugging bug-fixing capabilities

AI-generated Key Points

  • Large language models (LLMs) have revolutionized AI-assisted coding tasks
  • Generated code often contains critical errors requiring human intervention
  • Introduction of Multi-Granularity Debugger (MGDebugger) for hierarchical code debugging
  • MGDebugger isolates, identifies, and resolves bugs at different levels of granularity
  • Proposed LLM-simulated Python executor to enhance debugging process
  • MGDebugger outperforms existing systems with 18.9% accuracy improvement in HumanEval and 97.6% repair success rate in HumanEvalFix
  • Demonstrated robustness in fixing bugs across different categories and difficulty levels
  • Detailed case study highlights MGDebugger's effectiveness in identifying and correcting buggy parts
  • Comprehensive error correction without introducing new bugs by decomposing complex problems into distinct subfunctions
  • MGDebugger enhances code clarity and correctness, improving the quality of LLM-generated code
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yuling Shi, Songsong Wang, Chengcheng Wan, Xiaodong Gu

Code and data available at https://github.com/YerbaPage/MGDebugger
License: CC BY 4.0

Abstract: While large language models have made significant strides in code generation, the pass rate of the generated code is bottlenecked on subtle errors, often requiring human intervention to pass tests, especially for complex problems. Existing LLM-based debugging systems treat generated programs as monolithic units, failing to address bugs at multiple levels of granularity, from low-level syntax errors to high-level algorithmic flaws. In this paper, we introduce Multi-Granularity Debugger (MGDebugger), a hierarchical code debugger by isolating, identifying, and resolving bugs at various levels of granularity. MGDebugger decomposes problematic code into a hierarchical tree structure of subfunctions, with each level representing a particular granularity of error. During debugging, it analyzes each subfunction and iteratively resolves bugs in a bottom-up manner. To effectively test each subfunction, we propose an LLM-simulated Python executor, which traces code execution and tracks important variable states to pinpoint errors accurately. Extensive experiments demonstrate that MGDebugger outperforms existing debugging systems, achieving an 18.9% improvement in accuracy over seed generations in HumanEval and a 97.6% repair success rate in HumanEvalFix. Furthermore, MGDebugger effectively fixes bugs across different categories and difficulty levels, demonstrating its robustness and effectiveness.

Submitted to arXiv on 02 Oct. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2410.01215v1

Large language models (LLMs) have revolutionized AI-assisted coding tasks, generating code snippets for various programming challenges with impressive proficiency. However, the generated code often contains critical errors that require human intervention to pass tests. This has led to a new paradigm where large models generate code and humans fix it. Debugging LLM-generated code has been a significant challenge. Existing systems treat the erroneous program as a monolithic entity without addressing bugs at varying levels of granularity. To address this issue, this paper introduces Multi-Granularity Debugger (MGDebugger), a hierarchical code debugger that isolates, identifies, and resolves bugs at different levels of granularity. MGDebugger decomposes problematic code into a hierarchical tree structure of subfunctions, each representing a specific level of error granularity. By analyzing and resolving bugs in a bottom-up manner within each subfunction, MGDebugger effectively targets and fixes errors across multiple levels. To enhance the debugging process, an LLM-simulated Python executor is proposed to trace code execution and track variable states accurately. Extensive experiments demonstrate that MGDebugger outperforms existing debugging systems by achieving an 18.9% improvement in accuracy over seed generations in HumanEval and an impressive 97.6% repair success rate in HumanEvalFix. Furthermore, MGDebugger showcases its robustness by effectively fixing bugs across different categories and difficulty levels. A detailed case study illustrates how MGDebugger excels in identifying and correcting buggy parts compared to baseline methods. By decomposing complex problems into distinct subfunctions for separate debugging, MGDebugger ensures comprehensive error correction without introducing new bugs. This approach not only fixes bugs but also enhances code clarity and correctness, showcasing the potential of MGDebugger in improving the quality of LLM-generated code. In conclusion, MGDebugger presents a novel approach to hierarchical code debugging that systematically addresses bugs at multiple levels of granularity. By restructuring complex code into a hierarchical framework and utilizing targeted debugging techniques, MGDebugger demonstrates superior bug-fixing capabilities compared to traditional holistic methods.
Created on 11 Oct. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.