SteP: Stacked LLM Policies for Web Actions

AI-generated keywords: Large Language Models Web Tasks Stacked Policies Dynamic Control Policy Composition

AI-generated Key Points

  • Performing tasks on the web using large language models (LLMs) poses challenges due to combinatorially large open-world tasks and variations across web interfaces.
  • Designing a singular LLM policy for all web tasks is complex, requiring coverage of task variations and maintaining a long history of actions and observations.
  • Introducing Stacked LLM Policies for Web Actions (SteP) dynamically composes policies to solve diverse web tasks by defining a Markov Decision Process with a stack of policies representing the state.
  • SteP allows dynamic control adapting to task complexity, enabling any policy to invoke another for flexibility in solving tasks at multiple levels of abstraction.
  • Experimental validation on various web benchmarks shows SteP's effectiveness, outperforming prior works on WebArena by 14.9% to 33.5% using GPT-4 policies and remaining competitive with less data usage on MiniWoB++.
  • Implementing SteP as a meta-policy enhances versatility by wrapping around existing policy classes, making it applicable in different scenarios.
  • Leveraging dedicated instructions and examples within each policy efficiently navigates the complex landscape of web interactions, showcasing the potential of dynamic policy composition in addressing challenges posed by web tasks.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Paloma Sodhi, S. R. K. Branavan, Yoav Artzi, Ryan McDonald

Accepted at Conference on Language Modeling (COLM) 2024. 30 pages, 15 figures
License: CC BY 4.0

Abstract: Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge, but requires carefully handing off control between policies. We propose Stacked LLM Policies for Web Actions (SteP), an approach to dynamically compose policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is a stack of policies representing the control state, i.e., the chain of policy calls. Unlike traditional methods that are restricted to static hierarchies, SteP enables dynamic control that adapts to the complexity of the task. We evaluate SteP against multiple baselines and web environments including WebArena, MiniWoB++, and a CRM. On WebArena, SteP improves (14.9\% to 33.5\%) over SOTA that use GPT-4 policies, while on MiniWob++, SteP is competitive with prior works while using significantly less data. Our code and data are available at https://asappresearch.github.io/webagents-step.

Submitted to arXiv on 05 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.03720v3

Performing tasks on the web using large language models (LLMs) poses significant challenges due to the combinatorially large open-world tasks and variations across web interfaces. Designing a singular LLM policy to handle all possible web tasks is complex and requires covering all task variations and maintaining a long history of actions and observations. To address this challenge, we introduce Stacked LLM Policies for Web Actions (SteP), which dynamically composes policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is represented by a stack of policies, allowing for dynamic control that adapts to task complexity. Unlike traditional methods with static hierarchies, SteP enables any policy to invoke another, providing flexibility in solving tasks that require operations at multiple levels of abstraction. Experimental validation on various web benchmarks including WebArena, MiniWoB++, and an airline CRM simulator demonstrates SteP's effectiveness. On WebArena, SteP outperforms prior works using GPT-4 policies by 14.9% to 33.5%, while on MiniWob++, it remains competitive with less data usage compared to previous approaches. Additionally, we implement SteP as a meta-policy that wraps around existing policy classes, enhancing its versatility and applicability in different scenarios. Our work builds upon advancements in language models for web tasks and leverages dedicated instructions and examples within each policy to efficiently navigate the complex landscape of web interactions. Overall, our approach showcases the potential of dynamic policy composition in addressing the challenges posed by web tasks and highlights the importance of adaptability and flexibility in achieving successful outcomes in diverse online environments.
Created on 11 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.