From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

AI-generated keywords: Code Intelligence Large Language Models Automated Software Development Practical Guide Research-Practice Gap

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Large language models (LLMs) have revolutionized automated software development by enabling the direct translation of natural language descriptions into functional code.
  • Commercial tools like Github Copilot, Cursor, Trae, and Claude Code have facilitated widespread adoption of LLMs in software development.
  • Evolution from rule-based systems to Transformer-based architectures has significantly improved performance metrics, with success rates on benchmarks like HumanEval surpassing 95%.
  • The paper provides a practical guide through analytic experiments focusing on code LLMs, covering the complete model life cycle from data curation to post-training techniques.
  • General LLMs such as GPT-4 and specialized code LLMs like StarCoder and QwenCoder are analyzed in detail.
  • The authors offer valuable guidance on leveraging code intelligence in modern software development practices, addressing design decisions, trade-offs, and nuances of code intelligence capabilities.
  • Considerations around code correctness, security implications, contextual awareness within large codebases, and integration with development workflows are discussed.
  • Promising research directions aligning with practical needs in leveraging code intelligence are highlighted.
  • Comprehensive experiments cover various aspects such as code pre-training methodologies, supervised fine-tuning strategies, reinforcement learning approaches, scaling laws, framework selection criteria, hyperparameter sensitivity assessments,model architecture comparisons,and dataset evaluations.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jian Yang, Wei Zhang, Shark Liu, Jiajun Wu, Shawn Guo, Yizhi Li

Abstract: Large language models (LLMs) have fundamentally transformed automated software development by enabling direct translation of natural language descriptions into functional code, driving commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). While the field has evolved dramatically from rule-based systems to Transformer-based architectures, achieving performance improvements from single-digit to over 95\% success rates on benchmarks like HumanEval. In this work, we provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs, systematically examining the complete model life cycle from data curation to post-training through advanced prompting paradigms, code pre-training, supervised fine-tuning, reinforcement learning, and autonomous coding agents. We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder), critically examining the techniques, design decisions, and trade-offs. Further, we articulate the research-practice gap between academic research (e.g., benchmarks and tasks) and real-world deployment (e.g., software-related code tasks), including code correctness, security, contextual awareness of large codebases, and integration with development workflows, and map promising research directions to practical needs. Last, we conduct a series of experiments to provide a comprehensive analysis of code pre-training, supervised fine-tuning, and reinforcement learning, covering scaling law, framework selection, hyperparameter sensitivity, model architectures, and dataset comparisons.

Submitted to arXiv on 23 Nov. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2511.18538v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence," authors Jian Yang, Wei Zhang, Shark Liu, Jiajun Wu, Shawn Guo, and Yizhi Li delve into the transformative impact of large language models (LLMs) on automated software development. These LLMs have revolutionized the field by allowing for the direct translation of natural language descriptions into functional code. This has led to widespread commercial adoption through tools like Github Copilot (Microsoft), Cursor (Anysphere), Trae (ByteDance), and Claude Code (Anthropic). The evolution from rule-based systems to Transformer-based architectures has significantly improved performance metrics. Success rates on benchmarks like HumanEval have soared from single digits to over 95%. The authors provide a comprehensive synthesis and practical guide through a series of analytic and probing experiments focused on code LLMs. They systematically examine the complete model life cycle from data curation to post-training techniques such as advanced prompting paradigms, code pre-training, supervised fine-tuning, reinforcement learning, and autonomous coding agents. The paper meticulously analyzes both general LLMs like GPT-4, Claude, and LLaMA as well as code-specialized LLMs including StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder. Through their detailed exploration and insights,the authors offer valuable guidance for navigating the complexities of leveraging code intelligence in modern software development practices. They critically examine techniques,d design decisions,and trade-offs associated with these models while shedding light on the nuances of code intelligence capabilities. Furthermore,the authors address the research-practice gap between academic benchmarks and real-world deployment challenges in software-related code tasks. This includes considerations around code correctness,s security implications,c contextual awareness within large codebases,a and seamless integration with development workflows. The authors also map out promising research directions that align with practical needs in this domain. Lastly, a series of experiments are conducted to provide a comprehensive analysis of various aspects such as code pre-training methodologies, supervised fine-tuning strategies, reinforcement learning approaches. This covers topics like scaling laws, framework selection criteria, hyperparameter sensitivity assessments,model architecture comparisons,res,and dataset evaluations.
Created on 29 Dec. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.