, , , ,
In the realm of intelligent agents, large language models (LLMs) have been instrumental in powering various tasks, such as web navigation. However, existing agents often fall short when it comes to effectively navigating real-world webpages due to the challenges posed by the versatility of actions on webpages, processing limitations with HTML text, and the complex decision-making required in the open-domain nature of the web. To address these issues, a groundbreaking solution known as AutoWebGLM has been developed. This automated web navigation agent surpasses even GPT-4 capabilities and is built upon ChatGLM3-6B. Drawing inspiration from human browsing patterns, AutoWebGLM incorporates an innovative HTML simplification algorithm to represent webpages concisely while retaining essential information. Through a hybrid human-AI approach, a robust dataset for training is curated to enhance the agent's understanding of webpage structures and operations. The model is further refined through reinforcement learning and rejection sampling techniques to improve comprehension of webpage content, browser functions, and task decomposition efficiency. To evaluate its performance, a bilingual benchmark named AutoWebBench is established for real-world web browsing tasks. Extensive testing across diverse web navigation benchmarks showcases the advancements made by AutoWebGLM while also highlighting areas that require further refinement for optimal performance in real environments. In conclusion, this paper introduces several key contributions: the development of AutoWebGLM for efficient completion of web browsing tasks through curriculum learning and advanced training methods; construction of a comprehensive dataset comprising approximately 10,000 traces for real webpage browsing operations; successful demonstration of AutoWebGLM's capabilities with 6B parameters achieving comparable performance to leading LLM-based agents. These achievements signify a significant step towards practical usability in tackling complex web navigation challenges. The system architecture of AutoWebGLM comprises two main components: a browsing framework utilizing various web processing modules to organize HTML information for decision-making by the LM agent; and the LM agent itself which learns from diverse data sources and employs reinforcement learning and rejection sampling techniques for self-improvement in web browsing capabilities. Furthermore, an ablation study is conducted to assess different stages of data and training strategies on model performance enhancement. Results indicate that incorporating complex task datasets significantly improves model performance aligning more closely with real-world scenarios. Additionally, training strategies such as DPO and RFT enhance model learning from mistakes and enable bootstrap enhancement respectively. Overall, AutoWebGLM represents a significant advancement in automated web navigation technology with promising implications for practical deployment in navigating complex online environments.
- - Large language models (LLMs) are crucial for tasks like web navigation but struggle with real-world webpages due to challenges like versatile actions, processing limitations, and complex decision-making.
- - AutoWebGLM is a groundbreaking solution that surpasses GPT-4 capabilities, built on ChatGLM3-6B, incorporating an innovative HTML simplification algorithm inspired by human browsing patterns.
- - A hybrid human-AI approach is used to curate a robust dataset for training, refined through reinforcement learning and rejection sampling techniques to enhance comprehension of webpage content and operations.
- - AutoWebGLM's performance is evaluated using the bilingual benchmark AutoWebBench for real-world web browsing tasks, showcasing advancements while identifying areas for further refinement.
- - Key contributions include the development of AutoWebGLM for efficient web browsing through curriculum learning, construction of a comprehensive dataset, and successful demonstration with 6B parameters achieving leading agent performance.
- - The system architecture comprises a browsing framework organizing HTML information and an LM agent utilizing diverse data sources and reinforcement learning/rejection sampling techniques for self-improvement in web browsing capabilities.
- - An ablation study assesses different stages of data/training strategies on model performance enhancement, indicating that complex task datasets and training strategies like DPO and RFT significantly improve model performance aligning with real-world scenarios.
Summary- Big computer programs that help with tasks on the internet have trouble working well with real websites because they face difficulties like doing different things, having limits in how much they can process, and making complicated decisions.
- A new solution called AutoWebGLM is better than a famous program called GPT-4. It uses a special method to simplify website code based on how people browse the internet.
- People and computers work together to create a strong set of information for teaching the program, making it smarter by learning from mistakes and improving its understanding of websites.
- AutoWebGLM's abilities are tested using a test called AutoWebBench, which shows improvements in browsing the internet while also pointing out areas that need more work.
- The important parts of this new program include creating it for better web browsing, making a detailed set of information, and proving its success with 6 billion settings achieving top performance.
Definitions- Large language models (LLMs): Big computer programs that help with tasks involving language and text processing.
- Web navigation: Moving around and using websites on the internet.
- Groundbreaking: Very innovative or revolutionary.
- Capabilities: Abilities or skills that something has.
- Dataset: A collection of data used for analysis or training purposes.
Introduction:
The internet has become an integral part of our daily lives, and with it comes the need for efficient web navigation. However, existing agents often struggle to effectively navigate real-world webpages due to various challenges. To address these issues, a groundbreaking solution known as AutoWebGLM has been developed.
Overview of AutoWebGLM:
AutoWebGLM is an automated web navigation agent that surpasses even GPT-4 capabilities. It is built upon ChatGLM3-6B and draws inspiration from human browsing patterns. The model incorporates an innovative HTML simplification algorithm to represent webpages concisely while retaining essential information.
Hybrid Human-AI Approach:
To enhance the agent's understanding of webpage structures and operations, a robust dataset for training is curated through a hybrid human-AI approach. This approach combines the strengths of both humans and AI in data collection and curation.
Reinforcement Learning and Rejection Sampling Techniques:
AutoWebGLM further refines its model through reinforcement learning and rejection sampling techniques. These techniques improve comprehension of webpage content, browser functions, and task decomposition efficiency.
Evaluation on Real-World Web Browsing Tasks:
To evaluate its performance, a bilingual benchmark named AutoWebBench is established for real-world web browsing tasks. Extensive testing across diverse web navigation benchmarks showcases the advancements made by AutoWebGLM while also highlighting areas that require further refinement for optimal performance in real environments.
Key Contributions:
This research paper introduces several key contributions:
1) Development of AutoWebGLM for efficient completion of web browsing tasks through curriculum learning and advanced training methods.
2) Construction of a comprehensive dataset comprising approximately 10,000 traces for real webpage browsing operations.
3) Successful demonstration of AutoWebGLM's capabilities with 6B parameters achieving comparable performance to leading LLM-based agents.
System Architecture:
The system architecture of AutoWebGLM comprises two main components: a browsing framework and the LM agent. The browsing framework utilizes various web processing modules to organize HTML information for decision-making by the LM agent. The LM agent learns from diverse data sources and employs reinforcement learning and rejection sampling techniques for self-improvement in web browsing capabilities.
Ablation Study:
An ablation study is conducted to assess different stages of data and training strategies on model performance enhancement. Results indicate that incorporating complex task datasets significantly improves model performance, aligning more closely with real-world scenarios. Additionally, training strategies such as DPO and RFT enhance model learning from mistakes and enable bootstrap enhancement respectively.
Conclusion:
AutoWebGLM represents a significant advancement in automated web navigation technology with promising implications for practical deployment in navigating complex online environments. Its innovative approach, hybrid human-AI dataset curation, reinforcement learning, and rejection sampling techniques make it a powerful tool for efficient web navigation tasks. Further refinements will only enhance its capabilities, making it an essential tool for navigating the ever-evolving landscape of the internet.