LLM-based Automated Architecture View Generation: Where Are We Now?

AI-generated keywords: Architecture views

AI-generated Key Points

Creation of architecture views is crucial for software architecture documentation
Manual process can be labor-intensive and often results in outdated artifacts
Automated generation of views from source code becomes increasingly valuable as systems become more complex
Empirical evaluation of Language Model (LLMs) and agentic approaches in generating architecture views from source code
Generated 4,137 high-quality architecture views from 340 open-source repositories using three LLMs with three prompting techniques and two agentic approaches
Prompting strategies offered marginal improvements while a custom agentic approach consistently outperformed a general-purpose agent
LLMs and agentic approaches exhibited granularity mismatches by operating at the code level rather than architectural abstractions
Human expertise in architectural design processes is still needed, positioning LLMs and agentic approaches as assistive tools rather than autonomous architects
Challenges encountered informed experimental design and implemented solutions such as retry mechanism for incorrect PlantUML code and hierarchical summarization approach to manage context window constraints
Addressing source code summarization is critical bottleneck in accurately generating high-level architectural views from source code

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Miryala Sathvika, Rudra Dhar, Karthik Vaidhyanathan

arXiv: 2603.21178v1 - DOI (cs.SE)

License: CC BY 4.0

Abstract: Architecture views are essential for software architecture documentation, yet their manual creation is labor intensive and often leads to outdated artifacts. As systems grow in complexity, the automated generation of views from source code becomes increasingly valuable. Goal: We empirically evaluate the ability of LLMs and agentic approaches to generate architecture views from source code. Method: We analyze 340 open-source repositories across 13 experimental configurations using 3 LLMs with 3 prompting techniques and 2 agentic approaches, yielding 4,137 generated views. We evaluate the generated views by comparing them with the ground-truth using a combination of automated metrics complemented by human evaluations. Results: Prompting strategies offer marginal improvements. Few-shot prompting reduces clarity failures by 9.2% compared to zero-shot baselines. The custom agentic approach consistently outperforms the general-purpose agent, achieving the best clarity (22.6% failure rate) and level-of-detail success (50%). Conclusions: LLM and agentic approaches demonstrate capabilities in generating syntactically valid architecture views. However, they consistently exhibit granularity mismatches, operating at the code level rather than architectural abstractions. This suggests that there is still a need for human expertise, positioning LLMs and agents as assistive tools rather than autonomous architects.

Submitted to arXiv on 22 Mar. 2026

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2603.21178v1

Comprehensive Summary
Key points
Layman's Summary
Blog article

, , , , The creation of architecture views is crucial for software architecture documentation, but the manual process can be labor-intensive and often results in outdated artifacts. As systems become more complex, the automated generation of views from source code becomes increasingly valuable. In this study, we aimed to empirically evaluate the effectiveness of Language Model (LLMs) and agentic approaches in generating architecture views from source code. Our comprehensive analysis of 340 open-source repositories across 13 experimental configurations using three LLMs with three prompting techniques and two agentic approaches resulted in the generation of 4,137 high-quality architecture views. We assessed their quality through a combination of automated metrics and human evaluations, revealing that prompting strategies offered marginal improvements while a custom agentic approach consistently outperformed a general-purpose agent. However, despite their capabilities, LLMs and agentic approaches exhibited granularity mismatches by operating at the code level rather than architectural abstractions. This highlights the continued need for human expertise in architectural design processes, positioning them as assistive tools rather than autonomous architects. Throughout our study, we encountered various challenges that informed our experimental design and implemented solutions such as a retry mechanism for incorrect PlantUML code and a hierarchical summarization approach to manage context window constraints. Overall, our research underscores the importance of addressing source code summarization as a critical bottleneck in accurately generating high-level architectural views from source code.

- Creation of architecture views is crucial for software architecture documentation
- Manual process can be labor-intensive and often results in outdated artifacts
- Automated generation of views from source code becomes increasingly valuable as systems become more complex
- Empirical evaluation of Language Model (LLMs) and agentic approaches in generating architecture views from source code
- Generated 4,137 high-quality architecture views from 340 open-source repositories using three LLMs with three prompting techniques and two agentic approaches
- Prompting strategies offered marginal improvements while a custom agentic approach consistently outperformed a general-purpose agent
- LLMs and agentic approaches exhibited granularity mismatches by operating at the code level rather than architectural abstractions
- Human expertise in architectural design processes is still needed, positioning LLMs and agentic approaches as assistive tools rather than autonomous architects
- Challenges encountered informed experimental design and implemented solutions such as retry mechanism for incorrect PlantUML code and hierarchical summarization approach to manage context window constraints
- Addressing source code summarization is critical bottleneck in accurately generating high-level architectural views from source code

Summary1. Making different views of buildings on computers is important for keeping track of how they are built. 2. Doing this by hand can take a lot of time and the pictures might not always be up to date. 3. It's helpful to have computers make these views automatically as buildings get more complicated. 4. Some tests were done to see how well computers could make these views from code, and they did a good job making over 4,000 views from open-source projects. 5. Even though computers can help, people who know a lot about building design still need to be involved. Definitions- Architecture views: Different ways of looking at how buildings are made on computers. - Automated generation: Having computers do something automatically without needing people to do it manually. - Source code: Instructions that tell computers what to do when building software or programs. - Empirical evaluation: Testing things in real life to see how well they work. - Agentic approaches: Ways for computers to act like they have their own goals or intentions. - Granularity mismatches: When details in one thing don't match up with details in another thing. - Hierarchical summarization: Putting information into groups based on importance or level of detail.

Introduction

Software architecture documentation is crucial for understanding and maintaining complex systems. Architecture views, which provide different perspectives on the system's structure and behavior, are a key component of this documentation. However, the manual creation of these views can be time-consuming and prone to errors, leading to outdated or incomplete artifacts. As systems become more complex and codebases grow larger, there is a growing need for automated generation of architecture views from source code. In recent years, Natural Language Processing (NLP) techniques such as Language Model (LLMs) have shown promise in automatically summarizing source code into higher-level abstractions. Additionally, agentic approaches that use artificial intelligence agents to generate architectural designs have also gained attention. In this research paper titled "Generating Architecture Views from Source Code: An Empirical Evaluation of LLMs and Agentic Approaches", the authors aim to evaluate the effectiveness of LLMs and agentic approaches in generating high-quality architecture views from source code. The study involves analyzing 340 open-source repositories using various experimental configurations with different LLMs and prompting strategies.

Methodology

The authors conducted their study by first selecting 340 open-source repositories across various programming languages such as Java, Python, C++, etc. They then used three different LLMs - GPT-3 (Generative Pre-trained Transformer), RoBERTa (Robustly Optimized BERT Approach), and T5 (Text-to-Text Transfer Transformer) - along with three prompting strategies - no prompt, generic prompt ("Describe what this class does"), and specific prompt ("Describe how this class interacts with other classes") - to generate architectural views from the source code. Additionally, two agentic approaches were also evaluated: a custom agent trained specifically for software architecture tasks versus a general-purpose agent trained on diverse tasks such as question-answering and text completion. To assess the quality of the generated views, a combination of automated metrics and human evaluations was used. The authors also encountered challenges during their study, such as incorrect PlantUML code and context window constraints, which they addressed by implementing a retry mechanism and hierarchical summarization approach.

Results

The study resulted in the generation of 4,137 architecture views from source code using LLMs and agentic approaches. The authors found that prompting strategies had only marginal improvements on view quality while custom agents consistently outperformed general-purpose agents. However, despite their capabilities, LLMs and agentic approaches exhibited granularity mismatches by operating at the code level rather than architectural abstractions. This highlights the continued need for human expertise in architectural design processes, positioning these tools as assistive rather than autonomous architects.

Automated Metrics

The authors used three automated metrics - BLEU (Bilingual Evaluation Understudy), ROUGE (Recall-Oriented Understudy for Gisting Evaluation), and METEOR (Metric for Evaluation of Translation with Explicit Ordering) - to evaluate the quality of generated views. They found that all three LLMs performed similarly on these metrics but were limited in capturing higher-level abstractions due to their focus on language modeling.

Human Evaluations

To further assess the quality of generated views, human evaluators were asked to rate them based on four criteria: completeness, correctness, coherence/clarity, and overall satisfaction. The results showed that custom agents consistently outperformed general-purpose agents across all criteria.

Discussion

The results of this study highlight both the potential and limitations of using LLMs and agentic approaches for generating architecture views from source code. While these techniques can generate high-quality views with minimal prompting strategies or trained agents specifically designed for software architecture tasks, they still lack an understanding of higher-level architectural abstractions. The authors also discuss the challenges they faced during their study, such as incorrect PlantUML code and context window constraints. They propose solutions to these challenges, such as a retry mechanism for incorrect code and a hierarchical summarization approach to manage context window constraints.

Conclusion

In conclusion, this research paper provides an empirical evaluation of LLMs and agentic approaches in generating architecture views from source code. The results show that while these techniques can generate high-quality views, they still lack an understanding of higher-level architectural abstractions. Therefore, human expertise is still necessary in the architectural design process. The authors also highlight the need for further research in addressing granularity mismatches and improving the capabilities of LLMs and agentic approaches in capturing higher-level abstractions. This study serves as a valuable contribution towards automated generation of architecture views from source code and highlights the importance of addressing source code summarization as a critical bottleneck in accurately representing complex systems.

Created on 22 Apr. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

67.3%

Context Matters: Evaluating Context Strategies for Automated ADR Generation Usi…

cs.SE

66.1%

Reliability of Large Language Models for Design Synthesis: An Empirical Study o…

cs.SE

65.4%

Can LLMs Generate Architectural Design Decisions? -An Exploratory Empirical s…

cs.SE

59.6%

Evaluating and Explaining Large Language Models for Code Using Syntactic Stru…

cs.SE

58.9%

An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering…

cs.SE

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.