AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

AI-generated keywords: Intelligent Agents

AI-generated Key Points

  • Large language models (LLMs) are crucial for tasks like web navigation but struggle with real-world webpages due to challenges like versatile actions, processing limitations, and complex decision-making.
  • AutoWebGLM is a groundbreaking solution that surpasses GPT-4 capabilities, built on ChatGLM3-6B, incorporating an innovative HTML simplification algorithm inspired by human browsing patterns.
  • A hybrid human-AI approach is used to curate a robust dataset for training, refined through reinforcement learning and rejection sampling techniques to enhance comprehension of webpage content and operations.
  • AutoWebGLM's performance is evaluated using the bilingual benchmark AutoWebBench for real-world web browsing tasks, showcasing advancements while identifying areas for further refinement.
  • Key contributions include the development of AutoWebGLM for efficient web browsing through curriculum learning, construction of a comprehensive dataset, and successful demonstration with 6B parameters achieving leading agent performance.
  • The system architecture comprises a browsing framework organizing HTML information and an LM agent utilizing diverse data sources and reinforcement learning/rejection sampling techniques for self-improvement in web browsing capabilities.
  • An ablation study assesses different stages of data/training strategies on model performance enhancement, indicating that complex task datasets and training strategies like DPO and RFT significantly improve model performance aligning with real-world scenarios.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang

License: CC BY 4.0

Abstract: Large language models (LLMs) have fueled many intelligent agent tasks, such as web navigation -- but most existing agents perform far from satisfying in real-world webpages due to three factors: (1) the versatility of actions on webpages, (2) HTML text exceeding model processing capacity, and (3) the complexity of decision-making due to the open-domain nature of web. In light of the challenge, we develop AutoWebGLM, a GPT-4-outperforming automated web navigation agent built upon ChatGLM3-6B. Inspired by human browsing patterns, we design an HTML simplification algorithm to represent webpages, preserving vital information succinctly. We employ a hybrid human-AI method to build web browsing data for curriculum training. Then, we bootstrap the model by reinforcement learning and rejection sampling to further facilitate webpage comprehension, browser operations, and efficient task decomposition by itself. For testing, we establish a bilingual benchmark -- AutoWebBench -- for real-world web browsing tasks. We evaluate AutoWebGLM across diverse web navigation benchmarks, revealing its improvements but also underlying challenges to tackle real environments. Related code, model, and data will be released at \url{https://github.com/THUDM/AutoWebGLM}.

Submitted to arXiv on 04 Apr. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2404.03648v1

, , , , In the realm of intelligent agents, large language models (LLMs) have been instrumental in powering various tasks, such as web navigation. However, existing agents often fall short when it comes to effectively navigating real-world webpages due to the challenges posed by the versatility of actions on webpages, processing limitations with HTML text, and the complex decision-making required in the open-domain nature of the web. To address these issues, a groundbreaking solution known as AutoWebGLM has been developed. This automated web navigation agent surpasses even GPT-4 capabilities and is built upon ChatGLM3-6B. Drawing inspiration from human browsing patterns, AutoWebGLM incorporates an innovative HTML simplification algorithm to represent webpages concisely while retaining essential information. Through a hybrid human-AI approach, a robust dataset for training is curated to enhance the agent's understanding of webpage structures and operations. The model is further refined through reinforcement learning and rejection sampling techniques to improve comprehension of webpage content, browser functions, and task decomposition efficiency. To evaluate its performance, a bilingual benchmark named AutoWebBench is established for real-world web browsing tasks. Extensive testing across diverse web navigation benchmarks showcases the advancements made by AutoWebGLM while also highlighting areas that require further refinement for optimal performance in real environments. In conclusion, this paper introduces several key contributions: the development of AutoWebGLM for efficient completion of web browsing tasks through curriculum learning and advanced training methods; construction of a comprehensive dataset comprising approximately 10,000 traces for real webpage browsing operations; successful demonstration of AutoWebGLM's capabilities with 6B parameters achieving comparable performance to leading LLM-based agents. These achievements signify a significant step towards practical usability in tackling complex web navigation challenges. The system architecture of AutoWebGLM comprises two main components: a browsing framework utilizing various web processing modules to organize HTML information for decision-making by the LM agent; and the LM agent itself which learns from diverse data sources and employs reinforcement learning and rejection sampling techniques for self-improvement in web browsing capabilities. Furthermore, an ablation study is conducted to assess different stages of data and training strategies on model performance enhancement. Results indicate that incorporating complex task datasets significantly improves model performance aligning more closely with real-world scenarios. Additionally, training strategies such as DPO and RFT enhance model learning from mistakes and enable bootstrap enhancement respectively. Overall, AutoWebGLM represents a significant advancement in automated web navigation technology with promising implications for practical deployment in navigating complex online environments.
Created on 20 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.