Automated Testing of the GUI of a Real-Life Engineering Software using Large Language Models

AI-generated keywords: Software Development Testing Real Users Large Language Models Automation

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Testing the final product with real users is crucial in software development to identify issues and inconsistencies.
Manual testing can be time-consuming, leading to the introduction of GERALLT, a system that uses Large Language Models (LLMs) for autonomous exploratory tests on GUI.
GERALLT generates a comprehensive list of potential unintuitive and inconsistent elements within the interface, streamlining the testing process and enhancing efficiency.
Evaluation of GERALLT on real-world engineering software showed successful identification of interface issues, providing valuable insights for future development.
The research paper by Tim Rosenbach, David Heidrich, and Alexander Weinert discusses how GERALLT automates GUI testing using LLMs, presenting its effectiveness at the A-Test Workshop during ICST'25.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Tim Rosenbach, David Heidrich, Alexander Weinert

2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

arXiv: 2505.17839v1 - DOI (cs.SE)

10 pages, presented at the A-Test Workshop of the ICST'25

License: CC BY-NC-ND 4.0

Abstract: One important step in software development is testing the finished product with actual users. These tests aim, among other goals, at determining unintuitive behavior of the software as it is presented to the end-user. Moreover, they aim to determine inconsistencies in the user-facing interface. They provide valuable feedback for the development of the software, but are time-intensive to conduct. In this work, we present GERALLT, a system that uses Large Language Models (LLMs) to perform exploratory tests of the Graphical User Interface (GUI) of a real-life engineering software. GERALLT automatically generates a list of potential unintuitive and inconsistent parts of the interface. We present the architecture of GERALLT and evaluate it on a real-world use case of the engineering software, which has been extensively tested by developers and users. Our results show that GERALLT is able to determine issues with the interface that support the software development team in future development of the software.

Submitted to arXiv on 23 May. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2505.17839v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the realm of software development, a crucial aspect is the testing of the final product with real users to identify any potential issues or inconsistencies in its behavior and interface. This process not only helps in uncovering unintuitive aspects of the software but also provides valuable feedback for further development. However, conducting these tests manually can be time-consuming. To address this challenge, a system called GERALLT has been introduced. GERALLT leverages Large Language Models (LLMs) to autonomously conduct exploratory tests on the Graphical User Interface (GUI) of an actual engineering software. By utilizing LLMs, GERALLT can generate a comprehensive list of potential unintuitive and inconsistent elements within the interface. The architecture of GERALLT is designed to streamline the testing process and enhance efficiency. To evaluate its effectiveness, GERALLT was applied to a real-world scenario involving an engineering software that had already undergone extensive testing by both developers and users. The results demonstrated that GERALLT successfully identified issues within the interface, providing valuable insights to support future software development endeavors. Authored by Tim Rosenbach, David Heidrich, and Alexander Weinert, the research paper titled "Automated Testing of the GUI of a Real-Life Engineering Software using Large Language Models" delves into how GERALLT revolutionizes GUI testing through automation. Presented at the A-Test Workshop during ICST'25 and published in the 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), this work showcases the potential of leveraging advanced technologies for enhancing software development processes.

- Testing the final product with real users is crucial in software development to identify issues and inconsistencies.
- Manual testing can be time-consuming, leading to the introduction of GERALLT, a system that uses Large Language Models (LLMs) for autonomous exploratory tests on GUI.
- GERALLT generates a comprehensive list of potential unintuitive and inconsistent elements within the interface, streamlining the testing process and enhancing efficiency.
- Evaluation of GERALLT on real-world engineering software showed successful identification of interface issues, providing valuable insights for future development.
- The research paper by Tim Rosenbach, David Heidrich, and Alexander Weinert discusses how GERALLT automates GUI testing using LLMs, presenting its effectiveness at the A-Test Workshop during ICST'25.

Summary1. Testing the final product means checking it with real users to find problems and mistakes. 2. Manual testing takes a lot of time, so GERALLT was created to use special models for testing on computer screens. 3. GERALLT makes a big list of things that might be confusing or not working right on the screen, making testing faster and better. 4. When tested on real software, GERALLT found problems in the design, helping make future improvements. 5. A research paper talks about how GERALLT helps test computer screens automatically and how well it worked at a workshop. Definitions- Testing: Checking something to see if it works correctly. - GUI: Graphical User Interface - how things look and work on a computer screen. - Autonomous: Doing something by itself without needing help. - Comprehensive: Including many details or covering everything. - Engineering software: Programs used by engineers for designing and creating things. - Insights: Understanding or information gained from studying something.

Introduction

In the world of software development, testing is a crucial aspect that ensures the quality and functionality of the final product. This process involves evaluating the software's behavior and interface to identify any potential issues or inconsistencies. However, conducting these tests manually can be time-consuming and labor-intensive. To address this challenge, a team of researchers has introduced GERALLT - an automated system that utilizes Large Language Models (LLMs) to conduct exploratory tests on Graphical User Interfaces (GUIs). In their research paper titled "Automated Testing of the GUI of a Real-Life Engineering Software using Large Language Models", Tim Rosenbach, David Heidrich, and Alexander Weinert delve into how GERALLT revolutionizes GUI testing through automation.

The Need for Automated Testing

Traditionally, software testing has been performed manually by developers or dedicated testers. While this approach provides valuable insights into the software's performance from a technical perspective, it often falls short in identifying user-centric issues within the interface. Additionally, manual testing can be time-consuming and prone to human error. With advancements in technology such as Artificial Intelligence (AI) and Machine Learning (ML), there is now an opportunity to automate certain aspects of software testing. This not only saves time but also allows for more comprehensive coverage of potential issues within the interface.

The Role of LLMs in GUI Testing

GERALLT leverages LLMs - AI-based models trained on large datasets - to autonomously explore different paths within a GUI and identify potential issues or inconsistencies. These models have shown great success in natural language processing tasks such as text generation and translation. By applying LLMs to GUI testing, GERALLT can generate a comprehensive list of potential unintuitive elements within an interface that may not have been identified through traditional manual testing methods.

Architecture Design

The architecture of GERALLT is designed to streamline the testing process and enhance efficiency. It consists of three main components - the GUI crawler, the LLM-based test generator, and the test execution engine. The GUI crawler is responsible for navigating through different paths within a GUI, simulating user interactions such as clicks and inputs. This allows GERALLT to explore various scenarios within the interface. The LLM-based test generator utilizes pre-trained language models to generate natural language descriptions of potential issues or inconsistencies within the interface. These descriptions are then used to create automated tests that can be executed by the test execution engine.

Evaluation of GERALLT

To evaluate its effectiveness, GERALLT was applied to a real-world scenario involving an engineering software that had already undergone extensive testing by both developers and users. The results demonstrated that GERALLT successfully identified issues within the interface that were not previously detected through manual testing methods. This showcases how leveraging advanced technologies such as LLMs can enhance software development processes by providing valuable insights into user-centric issues within interfaces.

Future Implications

GERALLT's success in identifying potential issues within a real-life engineering software highlights its potential for future applications in other industries and domains. By automating certain aspects of GUI testing, developers can save time and resources while also improving overall software quality. Additionally, this research opens up possibilities for further advancements in AI-based testing methods. As technology continues to evolve, we can expect more sophisticated tools like GERALLT to emerge, further enhancing software development processes.

Conclusion

In conclusion, "Automated Testing of the GUI of a Real-Life Engineering Software using Large Language Models" presents an innovative approach towards GUI testing through automation. By leveraging LLMs, GERALLT has shown promising results in identifying potential user-centric issues within interfaces that may have been missed through traditional manual testing methods. This research not only highlights the potential of advanced technologies in software development but also provides a glimpse into the future of automated testing.

Created on 26 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

74.4%

Before Autonomy Takes Control: Software Testing in Robotics

cs.SE

73.9%

From Code Foundation Models to Agents and Applications: A Practical Guide to Co…

cs.SE

73.0%

Autonomous Large Language Model Agents Enabling Intent-Driven Mobile GUI Test…

cs.SE

72.0%

Artificial Intelligence helps making Quality Assurance processes leaner

cs.SE

72.0%

An Empirical Study of Unit Test Generation with Large Language Models

cs.SE

70.8%

Rethinking Code Review Workflows with LLM Assistance: An Empirical Study

cs.SE

70.3%

Impact of Large Language Models on Generating Software Specifications

cs.SE

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.