LLM-based Automated Architecture View Generation: Where Are We Now?

AI-generated keywords: Architecture views

AI-generated Key Points

  • Creation of architecture views is crucial for software architecture documentation
  • Manual process can be labor-intensive and often results in outdated artifacts
  • Automated generation of views from source code becomes increasingly valuable as systems become more complex
  • Empirical evaluation of Language Model (LLMs) and agentic approaches in generating architecture views from source code
  • Generated 4,137 high-quality architecture views from 340 open-source repositories using three LLMs with three prompting techniques and two agentic approaches
  • Prompting strategies offered marginal improvements while a custom agentic approach consistently outperformed a general-purpose agent
  • LLMs and agentic approaches exhibited granularity mismatches by operating at the code level rather than architectural abstractions
  • Human expertise in architectural design processes is still needed, positioning LLMs and agentic approaches as assistive tools rather than autonomous architects
  • Challenges encountered informed experimental design and implemented solutions such as retry mechanism for incorrect PlantUML code and hierarchical summarization approach to manage context window constraints
  • Addressing source code summarization is critical bottleneck in accurately generating high-level architectural views from source code
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Miryala Sathvika, Rudra Dhar, Karthik Vaidhyanathan

License: CC BY 4.0

Abstract: Architecture views are essential for software architecture documentation, yet their manual creation is labor intensive and often leads to outdated artifacts. As systems grow in complexity, the automated generation of views from source code becomes increasingly valuable. Goal: We empirically evaluate the ability of LLMs and agentic approaches to generate architecture views from source code. Method: We analyze 340 open-source repositories across 13 experimental configurations using 3 LLMs with 3 prompting techniques and 2 agentic approaches, yielding 4,137 generated views. We evaluate the generated views by comparing them with the ground-truth using a combination of automated metrics complemented by human evaluations. Results: Prompting strategies offer marginal improvements. Few-shot prompting reduces clarity failures by 9.2% compared to zero-shot baselines. The custom agentic approach consistently outperforms the general-purpose agent, achieving the best clarity (22.6% failure rate) and level-of-detail success (50%). Conclusions: LLM and agentic approaches demonstrate capabilities in generating syntactically valid architecture views. However, they consistently exhibit granularity mismatches, operating at the code level rather than architectural abstractions. This suggests that there is still a need for human expertise, positioning LLMs and agents as assistive tools rather than autonomous architects.

Submitted to arXiv on 22 Mar. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2603.21178v1

, , , , The creation of architecture views is crucial for software architecture documentation, but the manual process can be labor-intensive and often results in outdated artifacts. As systems become more complex, the automated generation of views from source code becomes increasingly valuable. In this study, we aimed to empirically evaluate the effectiveness of Language Model (LLMs) and agentic approaches in generating architecture views from source code. Our comprehensive analysis of 340 open-source repositories across 13 experimental configurations using three LLMs with three prompting techniques and two agentic approaches resulted in the generation of 4,137 high-quality architecture views. We assessed their quality through a combination of automated metrics and human evaluations, revealing that prompting strategies offered marginal improvements while a custom agentic approach consistently outperformed a general-purpose agent. However, despite their capabilities, LLMs and agentic approaches exhibited granularity mismatches by operating at the code level rather than architectural abstractions. This highlights the continued need for human expertise in architectural design processes, positioning them as assistive tools rather than autonomous architects. Throughout our study, we encountered various challenges that informed our experimental design and implemented solutions such as a retry mechanism for incorrect PlantUML code and a hierarchical summarization approach to manage context window constraints. Overall, our research underscores the importance of addressing source code summarization as a critical bottleneck in accurately generating high-level architectural views from source code.
Created on 22 Apr. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.