Agent Workflow Memory

AI-generated keywords: Language Model-based Agents Agent Workflow Memory (AWM) Sub-routine-based Induction Web Navigation Benchmarks Reusable Workflows

AI-generated Key Points

  • Study focuses on enhancing performance of language model-based agents in real-world tasks like web navigation
  • Challenge is dealing with long-horizon tasks involving complex action trajectories
  • Humans efficiently solve tasks by learning reusable task workflows from past experiences
  • Researchers introduce Agent Workflow Memory (AWM) to bridge gap and help agents benefit from similar process
  • AWM involves inducing commonly reused routines or workflows for future task-solving assistance
  • Abstract, sub-routine-based induction methods using Language Models (LMs) compared to rule-based methods without context and sub-routine abstraction
  • LM-based workflow induction more efficient by using fewer steps and preventing unnecessary actions, improving task-solving efficiency
  • AWM tested on Mind2Web and WebArena benchmarks, significantly enhancing baseline results with relative success rate improvements of 24.6% and 51.1%
  • Online AWM demonstrates robust generalization capabilities across different evaluations, outperforming baselines by up to 14.0 absolute points as train-test task distribution gaps widen
  • Importance of abstract, reusable workflows in improving agent performance on complex tasks emphasized
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig

License: CC BY-SA 4.0

Abstract: Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.

Submitted to arXiv on 11 Sep. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2409.07429v1

This study focuses on enhancing the performance of language model-based agents in solving real-world tasks such as web navigation. The current challenge lies in dealing with long-horizon tasks that involve complex action trajectories. Unlike machines, humans have the ability to efficiently solve intricate tasks by learning reusable task workflows from past experiences and using them to guide future actions. To bridge this gap and enable agents to benefit from a similar process, the researchers introduce Agent Workflow Memory (AWM). This method involves inducing commonly reused routines or workflows and selectively providing them to the agent to assist in future task-solving processes. The study explores how abstract, sub-routine-based induction methods using Language Models (LMs) compare to rule-based methods without context and sub-routine abstraction. Results show that while rule- and LM-based workflow induction perform comparably in terms of success rate, the LM-based method proves to be more efficient by using fewer steps. The finer-grained workflows produced by LM-based induction prevent agents from following unnecessary steps present in rule-induced workflows, thereby improving task-solving efficiency. Furthermore, AWM is tested on two major web navigation benchmarks - Mind2Web and WebArena - covering a wide range of tasks across various domains such as travel, shopping, and social media. AWM significantly enhances baseline results on both benchmarks, with relative success rate improvements of 24.6% and 51.1% on Mind2Web and WebArena respectively. Additionally, online AWM demonstrates robust generalization capabilities across different evaluations, outperforming baselines by up to 14.0 absolute points as train-test task distribution gaps widen. Overall, the study highlights the importance of abstract, reusable workflows in improving agent performance on complex tasks. By leveraging AWM to induce and apply workflows effectively, agents can enhance their problem-solving abilities and adaptability over time in dynamic environments.
Created on 12 Oct. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.