Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents

AI-generated keywords: Pre-trained LLMs PET framework Action Attention agent Instruction following Generalization

AI-generated Key Points

  • The study focuses on leveraging pre-trained large language models (LLMs) to simplify complex control tasks without compromising the trainable nature of the actor.
  • The proposed Plan, Eliminate, and Track (PET) framework consists of three key modules: Plan, Eliminate, and Track.
  • The Plan module breaks down tasks into sub-tasks using a pre-trained LLM.
  • The Eliminate module masks out irrelevant objects and receptacles from observations for the current sub-task using a zero-shot QA language model.
  • The Track module determines task completion and transitions to the next sub-task.
  • An Action Attention agent based on a transformer architecture is introduced to handle changing action spaces in text environments.
  • Results show that LLMs can remove 40% of task-irrelevant objects through common-sense QA and generate high-level sub-tasks with 99% accuracy.
  • Coordination between multiple LLMs can assist agents from different perspectives.
  • Contributions include introducing the PET framework as a novel approach to leveraging pre-trained LLMs with embodied agents.
  • Each component of P, E, T plays a complementary role in addressing control tasks effectively.
  • An Action Attention agent is introduced to handle variable length action spaces in text environments.
  • There is a significant 15% improvement over state-of-the-art methods for generalization to human goals through sub-task planning and tracking.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Yue Wu, So Yeon Min, Yonatan Bisk, Ruslan Salakhutdinov, Amos Azaria, Yuanzhi Li, Tom Mitchell, Shrimai Prabhumoye

License: CC BY 4.0

Abstract: Pre-trained large language models (LLMs) capture procedural knowledge about the world. Recent work has leveraged LLM's ability to generate abstract plans to simplify challenging control tasks, either by action scoring, or action modeling (fine-tuning). However, the transformer architecture inherits several constraints that make it difficult for the LLM to directly serve as the agent: e.g. limited input lengths, fine-tuning inefficiency, bias from pre-training, and incompatibility with non-text environments. To maintain compatibility with a low-level trainable actor, we propose to instead use the knowledge in LLMs to simplify the control problem, rather than solving it. We propose the Plan, Eliminate, and Track (PET) framework. The Plan module translates a task description into a list of high-level sub-tasks. The Eliminate module masks out irrelevant objects and receptacles from the observation for the current sub-task. Finally, the Track module determines whether the agent has accomplished each sub-task. On the AlfWorld instruction following benchmark, the PET framework leads to a significant 15% improvement over SOTA for generalization to human goal specifications.

Submitted to arXiv on 03 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.02412v2

This study focuses on leveraging pre-trained large language models (LLMs) to simplify complex control tasks without compromising the trainable nature of the actor. The proposed Plan, Eliminate, and Track (PET) framework consists of three key modules: Plan, Eliminate, and Track. The Plan module breaks down tasks into sub-tasks using a pre-trained LLM. The Eliminate module masks out irrelevant objects and receptacles from observations for the current sub-task using a zero-shot QA language model. Lastly, the Track module determines task completion and transitions to the next sub-task. Additionally, an Action Attention agent based on a transformer architecture is introduced to handle changing action spaces in text environments. This study specifically explores instruction following in indoor households within the AlfWorld interactive text environment benchmark. Results show that LLMs can remove 40% of task-irrelevant objects through common-sense QA and generate high-level sub-tasks with 99% accuracy. Furthermore, coordination between multiple LLMs can assist agents from different perspectives. The contributions of this work include introducing the PET framework as a novel approach to leveraging pre-trained LLMs with embodied agents. The study demonstrates that each component of P, E, T plays a complementary role in addressing control tasks effectively. Additionally, an Action Attention agent is introduced to handle variable length action spaces in text environments. Overall, there is a significant 15% improvement over state-of-the-art methods for generalization to human goals through sub-task planning and tracking. In related work analysis, prior research on language-conditioned policies through imitation learning or reinforcement learning has been explored. While some studies have used pre-trained language embeddings to enhance generalization to new instructions, they lack domain knowledge captured in LLMs. The PET framework enables effective planning, progress tracking, and observation filtering by harnessing the capabilities of LLMs in simplifying complex control tasks without compromising the trainable nature of the actor.
Created on 19 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.