WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

AI-generated keywords: Autonomous Agents Large Language Models WebRL Self-Evolving Curriculum Reinforcement Learning

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Large language models (LLMs) are powerful tools for web-based tasks in the realm of autonomous agents.
  • Current LLM web agents face challenges such as reliance on costly proprietary APIs and lack of decision-making capabilities.
  • WebRL is a groundbreaking framework designed to train high-performance web agents using open LLMs.
  • WebRL addresses critical challenges including scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning.
  • Key components of WebRL include a self-evolving curriculum, robust outcome-supervised reward model (ORM), and adaptive reinforcement learning strategies.
  • Application of WebRL has transformed open Llama-3.1 and GLM-4 models into proficient web agents with significantly improved success rates on the WebArena-Lite platform.
  • Results show that these open models outperform established models like GPT-4-Turbo and GPT-4o while surpassing previous state-of-the-art web agents trained on open LLMs like AutoWebGLM.
  • The effectiveness of WebRL highlights its role in bridging the gap between open and proprietary LLM-based web agents, advancing artificial intelligence and autonomous agent development.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Xinyue Yang, Jiadai Sun, Yu Yang, Shuntian Yao, Tianjie Zhang, Wei Xu, Jie Tang, Yuxiao Dong

License: CC BY-NC-ND 4.0

Abstract: Large language models (LLMs) have shown remarkable potential as autonomous agents, particularly in web-based tasks. However, existing LLM web agents heavily rely on expensive proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. This paper introduces WebRL, a self-evolving online curriculum reinforcement learning framework designed to train high-performance web agents using open LLMs. WebRL addresses three key challenges in building LLM web agents, including the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. Specifically, WebRL incorporates 1) a self-evolving curriculum that generates new tasks from unsuccessful attempts, 2) a robust outcome-supervised reward model (ORM), and 3) adaptive reinforcement learning strategies to ensure consistent improvements. We apply WebRL to transform open Llama-3.1 and GLM-4 models into proficient web agents. On WebArena-Lite, WebRL improves the success rate of Llama-3.1-8B from 4.8% to 42.4%, and from 6.1% to 43% for GLM-4-9B. These open models significantly surpass the performance of GPT-4-Turbo (17.6%) and GPT-4o (13.9%) and outperform previous state-of-the-art web agents trained on open LLMs (AutoWebGLM, 18.2%). Our findings demonstrate WebRL's effectiveness in bridging the gap between open and proprietary LLM-based web agents, paving the way for more accessible and powerful autonomous web interaction systems.

Submitted to arXiv on 04 Nov. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2411.02337v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the realm of autonomous agents, large language models (LLMs) have emerged as powerful tools for web-based tasks. However, current LLM web agents often rely on costly proprietary LLM APIs, while open LLMs lack the necessary decision-making capabilities. To address this gap, a groundbreaking framework called WebRL has been introduced. <br> WebRL is a self-evolving online curriculum reinforcement learning system designed to train high-performance web agents using open LLMs. It tackles three critical challenges faced in constructing LLM web agents: the scarcity of training tasks, sparse feedback signals, and policy distribution drift in online learning. This innovative framework incorporates several key components to ensure effective training.<br> Firstly, it features a self-evolving curriculum that generates new tasks from unsuccessful attempts. Secondly, it utilizes a robust outcome-supervised reward model (ORM) to provide feedback for learning. Lastly, adaptive reinforcement learning strategies are employed to guarantee consistent improvements over time.<br> Through the application of WebRL, open Llama-3.1 and GLM-4 models have been transformed into proficient web agents. Notably, on the WebArena-Lite platform,<DateTime>, WebRL has significantly enhanced the success rates of these models - boosting Llama-3.1-8B from 4.8% to an impressive 42.4%, and elevating GLM-4-9B from 6.1% to 43%. These results demonstrate that these open models surpass the performance of established models like GPT-4-Turbo and GPT-4o while outperforming previous state-of-the-art web agents trained on open LLMs such as AutoWebGLM.<br> The findings underscore the effectiveness of WebRL in bridging the divide between open and proprietary LLM-based web agents. By paving the way for more accessible and powerful autonomous web interaction systems, WebRL represents a significant advancement in the field of artificial intelligence and autonomous agent development.
Created on 07 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.