Automated Testing of the GUI of a Real-Life Engineering Software using Large Language Models

AI-generated keywords: Software Development Testing Real Users Large Language Models Automation

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Testing the final product with real users is crucial in software development to identify issues and inconsistencies.
  • Manual testing can be time-consuming, leading to the introduction of GERALLT, a system that uses Large Language Models (LLMs) for autonomous exploratory tests on GUI.
  • GERALLT generates a comprehensive list of potential unintuitive and inconsistent elements within the interface, streamlining the testing process and enhancing efficiency.
  • Evaluation of GERALLT on real-world engineering software showed successful identification of interface issues, providing valuable insights for future development.
  • The research paper by Tim Rosenbach, David Heidrich, and Alexander Weinert discusses how GERALLT automates GUI testing using LLMs, presenting its effectiveness at the A-Test Workshop during ICST'25.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Tim Rosenbach, David Heidrich, Alexander Weinert

2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)
10 pages, presented at the A-Test Workshop of the ICST'25
License: CC BY-NC-ND 4.0

Abstract: One important step in software development is testing the finished product with actual users. These tests aim, among other goals, at determining unintuitive behavior of the software as it is presented to the end-user. Moreover, they aim to determine inconsistencies in the user-facing interface. They provide valuable feedback for the development of the software, but are time-intensive to conduct. In this work, we present GERALLT, a system that uses Large Language Models (LLMs) to perform exploratory tests of the Graphical User Interface (GUI) of a real-life engineering software. GERALLT automatically generates a list of potential unintuitive and inconsistent parts of the interface. We present the architecture of GERALLT and evaluate it on a real-world use case of the engineering software, which has been extensively tested by developers and users. Our results show that GERALLT is able to determine issues with the interface that support the software development team in future development of the software.

Submitted to arXiv on 23 May. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2505.17839v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the realm of software development, a crucial aspect is the testing of the final product with real users to identify any potential issues or inconsistencies in its behavior and interface. This process not only helps in uncovering unintuitive aspects of the software but also provides valuable feedback for further development. However, conducting these tests manually can be time-consuming. To address this challenge, a system called GERALLT has been introduced. GERALLT leverages Large Language Models (LLMs) to autonomously conduct exploratory tests on the Graphical User Interface (GUI) of an actual engineering software. By utilizing LLMs, GERALLT can generate a comprehensive list of potential unintuitive and inconsistent elements within the interface. The architecture of GERALLT is designed to streamline the testing process and enhance efficiency. To evaluate its effectiveness, GERALLT was applied to a real-world scenario involving an engineering software that had already undergone extensive testing by both developers and users. The results demonstrated that GERALLT successfully identified issues within the interface, providing valuable insights to support future software development endeavors. Authored by Tim Rosenbach, David Heidrich, and Alexander Weinert, the research paper titled "Automated Testing of the GUI of a Real-Life Engineering Software using Large Language Models" delves into how GERALLT revolutionizes GUI testing through automation. Presented at the A-Test Workshop during ICST'25 and published in the 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), this work showcases the potential of leveraging advanced technologies for enhancing software development processes.
Created on 26 Mar. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.