Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation

AI-generated keywords: Artificial Intelligence Large Language Models Code Generation Software Development Empirical Analysis

AI-generated Key Points

  • Large Language Models (LLMs) are powerful tools for code generation in Artificial Intelligence (AI)
  • LLMs demonstrate proficiency in code completion, source code mapping, and system maintenance
  • Current evaluations of LLMs have mainly been in research settings, highlighting a gap in understanding their effectiveness in real-world applications
  • Empirical analysis on DevGPT dataset shows that LLM-generated code is often used for illustrating concepts or examples rather than production-ready code
  • Further improvement is needed to enhance LLMs for seamless integration into software development practices
  • Advancements like CodeGPT, CodeParrot, and Codex show potential to revolutionize software engineering tasks through human-AI collaboration
  • Practical challenges and limitations need to be addressed for effective deployment of LLMs for code generation
  • Continued research and development are essential to enhance LLM capabilities for real-world applications
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Kailun Jin, Chung-Yu Wang, Hung Viet Pham, Hadi Hemmati

4 pages, 3 figures, 21st International Conference on Mining Software Repositories (MSR '24), April 15-16, 2024, Lisbon, Portugal
License: CC BY-NC-SA 4.0

Abstract: Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development.

Submitted to arXiv on 18 Feb. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2402.11702v2

In the realm of Artificial Intelligence (AI), Large Language Models (LLMs) have emerged as powerful tools for code generation. These models have showcased impressive proficiency in various development scenarios such as code completion, source code mapping, and system maintenance. However, most evaluations of LLMs have been conducted in research settings, leaving a significant gap in understanding their effectiveness in real-world applications. To bridge this gap, our study delves into an empirical analysis of conversations within DevGPT - a dataset derived from developers' interactions with ChatGPT on platforms like GitHub. Our findings reveal that the current utilization of LLM-generated code is often limited to illustrating high-level concepts or providing examples in documentation rather than being deployed as production-ready code. This suggests the need for further improvement in enhancing LLMs for code generation before they can seamlessly integrate into modern software development practices. Recent advancements in LLMs such as CodeGPT, CodeParrot, and Codex underscore their potential to revolutionize software engineering tasks through collaborative efforts between humans and AI. As we navigate this paradigm shift towards more sophisticated AI-driven solutions, it becomes imperative to address the practical challenges and limitations associated with deploying LLMs for code generation effectively. In conclusion, our study sheds light on the current state of using LLMs for code generation and emphasizes the need for further research and development to enhance their capabilities for real-world applications. By leveraging insights from empirical analyses like ours, we can pave the way for more efficient and seamless integration of LLMs into modern software development workflows.
Created on 03 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.