SteP: Stacked LLM Policies for Web Actions
AI-generated Key Points
- Performing tasks on the web using large language models (LLMs) poses challenges due to combinatorially large open-world tasks and variations across web interfaces.
- Designing a singular LLM policy for all web tasks is complex, requiring coverage of task variations and maintaining a long history of actions and observations.
- Introducing Stacked LLM Policies for Web Actions (SteP) dynamically composes policies to solve diverse web tasks by defining a Markov Decision Process with a stack of policies representing the state.
- SteP allows dynamic control adapting to task complexity, enabling any policy to invoke another for flexibility in solving tasks at multiple levels of abstraction.
- Experimental validation on various web benchmarks shows SteP's effectiveness, outperforming prior works on WebArena by 14.9% to 33.5% using GPT-4 policies and remaining competitive with less data usage on MiniWoB++.
- Implementing SteP as a meta-policy enhances versatility by wrapping around existing policy classes, making it applicable in different scenarios.
- Leveraging dedicated instructions and examples within each policy efficiently navigates the complex landscape of web interactions, showcasing the potential of dynamic policy composition in addressing challenges posed by web tasks.
Authors: Paloma Sodhi, S. R. K. Branavan, Yoav Artzi, Ryan McDonald
Abstract: Performing tasks on the web presents fundamental challenges to large language models (LLMs), including combinatorially large open-world tasks and variations across web interfaces. Simply specifying a large prompt to handle all possible behaviors and states is extremely complex, and results in behavior leaks between unrelated behaviors. Decomposition to distinct policies can address this challenge, but requires carefully handing off control between policies. We propose Stacked LLM Policies for Web Actions (SteP), an approach to dynamically compose policies to solve a diverse set of web tasks. SteP defines a Markov Decision Process where the state is a stack of policies representing the control state, i.e., the chain of policy calls. Unlike traditional methods that are restricted to static hierarchies, SteP enables dynamic control that adapts to the complexity of the task. We evaluate SteP against multiple baselines and web environments including WebArena, MiniWoB++, and a CRM. On WebArena, SteP improves (14.9\% to 33.5\%) over SOTA that use GPT-4 policies, while on MiniWob++, SteP is competitive with prior works while using significantly less data. Our code and data are available at https://asappresearch.github.io/webagents-step.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.